光华讲坛——社会名流与企业家论坛第 6450 期
主 题:To Adjust or not to Adjust? Estimating the Average Treatment Effect in Randomized Experiments with Missing Covariates
主讲人:新加坡国立大学赵安琪博士
主持人:统计学院林华珍教授
时间:2023年4月28日(周五)下午3-4点
报告地点:腾讯会议,125-569-677
主办单位:统计研究中心和统计学院 科研处
主讲人简介:
赵安琪,新加坡国立大学统计与数据科学系助理教授,2016年在哈佛大学获得博士学位,在涉足管理咨询领域后于2019年加入新加坡国立大学,她的研究兴趣包括因果推断和试验设计。
内容简介:
Randomized experiments allow for consistent estimation of the average treatment effect based on the difference in mean outcomes without strong modeling assumptions. Appropriate use of pretreatment covariates can further improve the estimation efficiency. Missingness in covariates is nevertheless common in practice and raises an important question: should we adjust for covariates subject to missingness, and if so, how? The unadjusted difference in means is always unbiased. The complete-covariate analysis adjusts for all completely observed covariates and is asymptotically more efficient than the difference in means if at least one completely observed covariate is predictive of the outcome. Then what is the additional gain of adjusting for covariates subject to missingness? To reconcile the conflicting recommendations in the literature, we analyze and compare five strategies for handling missing covariates in randomized experiments under the design-based framework, and recommend the missingness-indicator method, as a known but not so popular strategy in the literature, due to its multiple advantages. First, it removes the dependence of the regression-adjusted estimators on the imputed values for the missing covariates. Second, it does not require modeling the missingness mechanism, and yields consistent estimators even when the missingness mechanism is related to the missing covariates and unobservable potential outcomes. Third, it ensures large-sample efficiency over the complete-covariate analysis and the analysis based on only the imputed covariates. Lastly, it is easy to implement via least squares. We also propose modifications to it based on asymptotic and finite sample considerations. Importantly, our theory views randomization as the basis for inference, and does not impose any modeling assumptions on the data generating process or missingness mechanism.
随机实验允许在没有强模型假设的情况下,根据平均结果的差异一致地估计平均治疗效果。适当使用预处理协变量可以进一步提高估计效率。然而,协变量中的缺失在实践中很常见,并提出了一个重要的问题:我们是否应该针对缺失的协变量进行调整,如果是,如何调整?在未调整情况下均值差始终是无偏的。完全协变量分析针对所有完全观测的协变量进行调整,如果至少有一个完全观测的协变量可以预测结果,则完全协变量分析在渐近上比均值差更有效。那么,针对缺失的协变量进行调整的额外收益是多少?为了调和文献中相互矛盾的建议,主讲人分析和比较了在实验设计的框架下随机实验中处理缺失协变量的五种策略,并推荐使用缺失指标方法,作为文献中已知但不那么流行的策略,它具有多重优势。首先,它消除了回归调整估计量对缺失协变量的插补值的依赖性。其次,它不需要对缺失机制进行建模,即使缺失机制与缺失协变量相关,或与不可观察的潜在结果相关,也会产生一致的估计量。第三,它确保了完全协变量分析和仅基于插补协变量的分析的大样本效率。最后,它很容易通过最小二乘法实现。主讲人还根据渐近和有限样本的考虑提出了修改建议。重要的是,主讲人的理论将随机化视为推理的基础,并且不会对数据生成过程或缺失机制施加任何建模假设。