加州大学尔湾分校Annie Qu教授:Integrating Multisource Block-Wise Missing Data in Model Selection-统计研究中心

统计研究中心

当前位置：首页 > 系列讲座 > 正文

加州大学尔湾分校Annie Qu教授:Integrating Multisource Block-Wise Missing Data in Model Selection

光华讲坛——社会名流与企业家论坛第 5829 期

(线上讲座）

主题： Integrating Multisource Block-Wise Missing Data in Model Selection

主讲人：加州大学尔湾分校Annie Qu教授

主持人：统计学院林华珍教授

时间：2020年6月29日（周一）11:00-12:00

直播平台及会议ID：zoom, 985 5427 8969

主办单位：统计研究中心、数据科学与商业智能联合实验室和统计学院科研处

主讲人简介：

Annie Qu，Chancellor’s Professor, Department of Statistics, University of California Irvine. Ph.D., Statistics, the Pennsylvania State University.Qu’s research focus on solving fundamental issues regarding unstructured large-scale data, developing cutting-edge statistical methods and theory in machine learning and algorithms on text sentiment analysis, automatic tagging and summarization, recommender systems, tensor imaging data and network data analyses for complex heterogeneous data, and achieving the extraction of essential information from large volume high-dimensional data. Her research has impacts in many different fields such as biomedical studies, genomic research, public health research, and social and political sciences.Before she joins the UC Irvine, Dr. Qu is Data Science Founder Professor of Statistics, and the Director of the Illinois Statistics Office at the University of Illinois at Urbana-Champaign. She was awarded as Brad and Karen Smith Professorial Scholar by the College of LAS at UIUC, a recipient of the NSF Career award in 2004-2009, and is a Fellow of the Institute of Mathematical Statistics and a Fellow of the American Statistical Association.

Annie Qu，加州大学尔湾分校Chancellor’s Professor。宾夕法尼亚州立大学统计学博士。她的主要研究方向是解决与非结构化大规模数据有关的基本问题，开发机器学习的前沿统计方法和理论文本sentiment analysis、自动标记和摘要、推荐系统、张量成像数据和复杂的异构网络数据分析，从大量高维数据中提取基本信息的算法。她的研究工作对生物医学研究、基因组研究、公共卫生研究以及社会和政治科学等诸多领域产生了深远影响。在加入加州大学尔湾分校之前，她是伊利诺伊大学厄巴纳-香槟分校统计学的冠名教授 Data Science Founder Professor，统计咨询中心主任。她曾获美国伊利诺伊大学厄巴纳-香槟分校 LAS学院Brad and Karen Smith Professorial Scholar， 2004-2009年荣获美国国家科学基金会Career Award。她是IMS和ASA的Fellow。

内容提要：

For multi-source data, blocks of variable information from certain sources are likely missing. Existing methods for handling missing data do not take structures of block-wise missing data into consideration. In this talk, we propose a Multiple Block-wise Imputation (MBI) approach, which incorporates imputations based on both complete and incomplete observations. Specifically, for a given missing pattern group, the imputations in MBI incorporate more samples from groups with fewer observed variables in addition to the group with complete observations. We propose to construct estimating equations based on all available information, and integrate informative estimating functions to achieve efficient estimators. We show that the proposed method has estimation and model selection consistency under both fixed-dimensional and high- dimensional settings. Moreover, the proposed estimator is asymptotically more efficient than the estimator based on a single imputation from complete observations only. In addition, the proposed method is not restricted to missing completely at random. Numerical studies and ADNI data application confirm that the proposed method outperforms existing variable selection methods under various missing mechanisms. This is joint work with Fei Xue of University of Pennsylvania.

对于多源数据，来自某些来源的变量信息块可能是缺失的。现有的处理缺失数据的方法没有考虑到分块式缺失数据的结构。本报告提出了一种多重块插补（MBI）方法，该方法结合了基于完整和不完整观测值的插补。具体而言，对于给定的缺失模式组，除了具有完整观察值的组之外，MBI中的插补还包含来自观察变量较少的组的更多样本。我们建议根据所有可用信息构造估计方程，并整合信息估计函数以实现有效的估计。我们证明了该方法在固定维和高维设置下均具有估计相合性和模型选择的相合性。此外，所提出的估计量比仅基于完整观测值的单个插补的估计更加渐近有效。此外，该方法不局限于完全随机缺失。数值研究和ADNI数据应用证实，在多种缺失机制下，该方法优于现有的变量选择方法。这是与宾夕法尼亚大学的Fei Xue共同完成的。

上一条：清华大学智能产业研究院兰艳艳教授：Uncertainty Calibration for Ensemble Based Debiasing Methods

下一条：厦门大学钟威教授：Multi-Kink Quantile Regression for Longitudinal Data with Application to Progesterone Data Analysis