光华讲坛——社会名流与企业家论坛第 5747 期
主 题:Multiply robust estimation of causal effects under principal ignorability
主讲人:加州大学伯克利分校丁鹏副教授
主持人:统计学院林华珍教授
时间:2021年5月17日(周一)上午10:00-11:00
直播平台及会议ID:腾讯会议,150 885 634
主办单位:统计研究中心和统计学院 科研处
主讲人简介:
Peng Ding is an Associate Professor in the Department of Statistics, UC Berkeley. He obtained his Ph.D. from the Department of Statistics, Harvard University in May 2015, and worked as a postdoctoral researcher in the Department of Epidemiology, Harvard T. H. Chan School of Public Health until December 2015. Previously, he received his B.S. (Mathematics), B.A. (Economics), and M.S. (Statistics) from Peking University.
丁鹏,加州大学伯克利分校统计学系副教授。他于2015年5月获得哈佛大学统计学系博士学位,并于2015年12月在哈佛大学公共卫生学院流行病学学系做博士后。在此之前,他从北京大学取得经济学和数学学士学位和统计学硕士学位。
内容提要:
Causal inference concerns not only the average effect of the treatment on the outcome but also the underlying mechanism through an intermediate variable of interest. Principal stratification characterizes such mechanism by targeting subgroup causal effects within principal strata, which are defined by the joint potential values of an intermediate variable. Due to the fundamental problem of causal inference, principal strata are inherently latent, rendering it challenging to identify and estimate subgroup effects within them. A line of research leverages the principal ignorability assumption that the latent principal strata are mean independent of the potential outcomes conditioning on the observed covariates. Under principal ignorability, we derive various nonparametric identification formulas for causal effects within principal strata in observational studies, which motivate estimators relying on the correct specifications of different parts of the observed-data distribution. Appropriately combining these estimators further yields new triply robust estimators for the causal effects within principal strata. These new estimators are consistent if two of the treatment, intermediate variable, and outcome models are correctly specified, and they are locally efficient if all three models are correctly specified. We show that these estimators arise naturally from either the efficient influence functions in the semiparametric theory or the model-assisted estimators in the survey sampling theory. We evaluate different estimators based on their finite-sample performance through simulation, apply them to two observational studies, and implement them in an open-source software package.
因果推理不仅涉及治疗对结局的平均效果,而且还涉及所关注的中间变量的潜在机制。主要分层通过针对主要分层内的亚组因果效应来表征这种机制,该效应由中间变量的联合势能值定义。由于因果推理的基本问题,主要层次具有内在的潜在性,因此很难识别和估计其中的子组效应。一系列研究利用了主要可忽略性的假设,即潜在的主要层次是均值,与观察到的协变量的潜在结果条件无关。在主要可忽略性下,我们为观测研究中的主要层级内的因果效应推导了各种非参数识别公式,这些公式激发了估计者依赖于观测数据分布不同部分的正确规范。适当地组合这些估计量,可以进一步得出针对主要层次内因果关系的三重鲁棒估计量。如果正确指定了治疗模型,中间变量模型和结果模型中的两个,则这些新的估计量是一致的;如果正确指定了所有三个模型,则它们在局部有效。我们表明,这些估计量自然来自半参数理论中的有效影响函数或调查抽样理论中的模型辅助估计量。我们通过仿真基于有限样本性能评估不同的估计量,将它们应用于两个观测研究,并在开源软件包中实施它们。