光华讲坛——社会名流与企业家论坛第 5813 期
主 题:Identifying effects of multiple treatments in the presence of unmeasured confounding
主讲人:北京大学概率统计系苗旺助理教授
主持人:统计学院林华珍教授
时间:2021年6月8日(周二)上午11:00-12:00
直播平台及会议ID:腾讯会议,499 705 280
主办单位:统计研究中心和统计学院 科研处
主讲人简介:
苗旺现为北京大学概率统计系助理教授, 2008-2017年在北京大学数学科学学院读本科和博士,2017-2018年在哈佛大学生物统计系做博士后研究,2018年入职北京大学。苗旺的研究兴趣包括因果推断,缺失数据分析,及其在生物统计,流行病学,经济学和人工智能研究中的应用。
内容提要:
Identification of treatment effects in the presence of unmeasured confounding is a persistent problem in the social, biological, and medical sciences. The problem of unmeasured confounding in settings with multiple treatments is most common in statistical genetics and bioinformatics settings, where researchers have developed many successful statistical strategies without engaging deeply with the causal aspects of the problem. Recently there have been a number of attempts to bridge the gap between these statistical approaches and causal inference, but these attempts have either been shown to be flawed or have relied on fully parametric assumptions. In this paper, we propose two strategies for identifying and estimating causal effects of multiple treatments in the presence of unmeasured confounding. The auxiliary variables approach leverages auxiliary variables that are not causally associated with the outcome; in the case of a univariate confounder, our method only requires one auxiliary variable, unlike existing instrumental variable methods that would require as many instruments as there are treatments. An alternative null treatments approach relies on the assumption that at least half of the confounded treatments have no causal effect on the outcome, but does not require a priori knowledge of which treatments are null. Our identification strategies do not impose parametric assumptions on the outcome model and do not rest on estimation of the confounder. This work extends and generalizes existing work on unmeasured confounding with a single treatment, and provides a nonparametric extension of models commonly used in bioinformatics.
在社会,生物学和医学科学领域,在存在无法衡量的混杂因素的情况下确定治疗效果是一个持续存在的问题。在统计遗传学和生物信息学背景下,多种处理方式中无法衡量的混杂问题最常见,研究人员开发了许多成功的统计策略,但并未深入探讨问题的因果关系。近来,已经进行了许多尝试来弥合这些统计方法和因果推论之间的差距,但是这些尝试要么被证明是有缺陷的,要么依赖于完全参数化的假设。在本文中,我们提出了两种策略,用于在存在无法衡量的混杂因素的情况下识别和估计多种治疗的因果关系。辅助变量方法利用与结果没有因果关系的辅助变量。在单变量混杂因素的情况下,我们的方法仅需要一个辅助变量,而现有的工具变量方法则需要与处理方法一样多的工具。另一种无效治疗方法基于以下假设:至少有一半的混杂治疗对结果没有因果关系,但是不需要先验知识就知道哪些治疗无效。我们的识别策略不会在结果模型上施加参数假设,也不会基于混杂因素的估计。这项工作扩展并归纳了现有的关于单一测量无法处理的混杂问题的工作,并提供了生物信息学中常用模型的非参数扩展。