光华讲坛——社会名流与企业家论坛第5506期
主题:Model selection and combination for estimating treatment effects
主讲人:明尼苏达大学 杨宇泓教授
主持人:统计学院统计研究中心 林华珍教授
时间:2019年7月23日下午2:00-3:00
地点:西南财经大学柳林校区经世楼E102教室
主办单位:统计研究中心 统计学院 科研处
主讲人简介:
杨宇泓教授于1988年获中国科大数学学士,1993年获伊利诺伊大学统计硕士,1996年获得耶鲁大学统计学博士,现为明尼苏达大学统计系教授及Director of Graduate Studies。他曾荣获美国国家科学基金会杰出青年教授奖(NSF CAREER Award),此奖项每年只有1-2名学者获此殊荣。并于2010年成为(国际)数理统计学会会士。曾主持美国自然科学基金项目4项。其研究兴趣包括高维数据分析理论、模型选择和组合、多臂老虎机问题(Multi-Arm Bandit)、精准医学统计、预测,并在这些领域建立了很多重要且深刻的理论和方法,发表论文70余篇,其中18篇为独立作者(single author)。这些论文发表在统计、机器学习、信息论、计量经济、预测、逼近论等领域顶尖刊物,如Annals of Statistics, JASA, Biometrika, JRSSB, IEEE Transactions on Information Theory, Journal of Econometrics, Journal of Approximation Theory, Journalof Machine Learning Research, and International Journal of Forecasting等,在Google Scholar上的引用多达4000多次。
主要内容:
It is increasingly clear that a treatment’s effect on a response may be heterogeneous with respect to baseline covariates (including possible genetic information). This is an important premise of personalized medicine. Several methods for estimating heterogeneous treatment effects have been proposed. However, little attention has been given to the problem of choosing between estimators of treatment effects. Models that best estimate the regression function may not be best for estimating the effect of a treatment; therefore, there is a need for model selection methods that are targeted to treatment effect estimation. We develop a treatment effect cross-validation aimed at minimizing treatment effect estimation errors. Theoretically, treatment effect cross-validation has a model selection consistency property when the data splitting ratio is properly chosen. Practically, treatment effect cross-validation has the flexibility to compare different types of models. We illustrate the methods by using simulation studies and data from a clinical trial comparing treatments of patients with human immunodeficiency virus.
When estimating conditional treatment effects, the currently dominating practice is to select a statistical model or procedure based on sample data. However, because finding out the best model can be very difficult due to limited information, combining estimates from the candidate procedures often provides a more accurate and much more stable estimate than the selection of a single procedure. We propose a method of model combination that targets accurate estimation of the treatment effect conditional on covariates. We provide a risk bound for the resulting estimator under squared error loss and illustrate the method using data from a labor skills training program.
This work is joint with Craig Rolling and Dagmar Velez.
就基线协变量(包括可能的遗传信息)而言,治疗效果作为响应变量的异质性表现得越来越突出。这也是个性化医疗的重要前提。目前已有集中估计异质性治疗效果的方法。然而,少有研究关注治疗效果各估计量之间的选择问题。回归函数的最优估计模型在估计治疗效果时可能并不是最优的;因此,需要建立针对治疗效果估计的模型选择方法。我们构建了一个治疗效果交叉验证方法,目的是最小化治疗效果估计误差。从理论上讲,当数据分割率选择得当时,治疗效果交叉验证在模型选择上具有一致性。在实践中,治疗效果交叉验证在比较不同类型模型具有很好的灵活性。我们通过模拟研究和来自对比免疫缺陷患者治疗效果的临床试验的数据来说明这些方法。
在评估条件治疗效果时,目前的主流做法是根据样本数据来选择统计模型或程序。然而,由于信息有限,找到最佳模型可能非常困难,因此在备选过程中对多个估计进行组合通常能得到比单个过程更准确、更稳定的估计。我们提出了一种基于协变量的,能准确地估计治疗效果的模型组合方法。我们还得到了平方误差损失下估计结果的风险边界,并使用劳动技能培训计划的数据进一步说明了该方法。
这项工作是与Craig Rolling和Dagmar Velez联合进行的。