华东师范大学唐炎林教授：Distribution-free prediction bands for clustered data with missing responses无需分布假设的具有缺失响应的聚类数据预测带-统计研究中心

统计研究中心

当前位置：首页 > 系列讲座 > 正文

华东师范大学唐炎林教授：Distribution-free prediction bands for clustered data with missing responses无需分布假设的具有缺失响应的聚类数据预测带

光华讲坛——社会名流与企业家论坛第期

主题：Distribution-free prediction bands for clustered data with missing responses无需分布假设的具有缺失响应的聚类数据预测带

主讲人：华东师范大学唐炎林教授

主持人：统计学院林华珍教授

时间：2023年11月28日（周二）下午16:00-17:00

举办地点：柳林校区弘远楼408会议室

主办单位：统计研究中心和统计学院科研处

主讲人简介：

唐炎林，华东师范大学统计学院教授，博士生导师，统计学系主任；入选国家高层次青年人才计划（组织部）。2012年1月博士毕业于复旦大学统计系，同年5月加入同济大学，2019年1月加入华东师范大学。主要研究方向为分位数回归、高维统计推断、不完全数据统计建模，主持多项国家自然科学基金、上海市自然科学基金，担任SCI期刊Statistica Sinica、Journal of the Korean Statistical Society的编委。在Biometrika、JRSSB、PNAS、Biometrics等发表论文30余篇。

内容简介：

Existing methods for missing clustered data often rely on strong model assumptions and are therefore prone to model misspecification. We construct prediction bands for the whole trajectories of new subjects based on the conformal inference, yielding covariate-dependent prediction bands with coverage guarantees in finite samples, without making any assumptions about model specification and within-cluster dependency structure. We first reduce the clustered data into independent cross-sectional data by subsampling, then propose three weighted conformal methods to produce prediction regions. To make use of the correlation information of the clustered data, we repeat the subsampling and conformal inference, to produce an integrated prediction region by combining the dependent p-values. Among the three proposed methods, the weighted CD-split method yields the smallest prediction region by converging to the highest density set, and provides asymptotic conditional coverage guarantees for each given subject. Simulations show that our methods have excellent finite-sample behavior under different complex error distributions compared to other alternatives. The practical use is demonstrated in the motivating serum cholesterol data and CD4+ cell data sets.

现有的缺失聚类数据的方法往往依赖于强模型假设，因此容易出现模型错配。主讲人基于共形推断为新个体的整个轨迹构建预测带，在有限样本中产生具有收敛性保证的依赖于协变量的预测带，而不需要对模型设置和类别内部相依性结构进行任何假设。首先通过下采样将聚类数据分解为独立的截面数据，然后提出三种加权共形方法来生成预测区域。为了利用聚类数据的相关信息，主讲人重复子抽样和共形推断，通过组合相关的p值来产生一个集成的预测区域。在三种方法中，加权CD分割方法通过收敛到最高密度集合产生最小的预测区域，并为每个给定个体提供渐近条件收敛性保证。模拟结果表明，与其他方法相比，主讲人的方法在不同复杂误差分布下都具有良好的有限样本表现。在激励这项研究的血清胆固醇数据和CD4+细胞数据集上展示了该方法的实际应用。

上一条：加州大学伯克利分校丁鹏副教授：Causal inference in network experiments: regression-based analysis and design-based properties网络实验中的因果推理:基于回归分析和基于设计的性质

下一条：圣路易斯华盛顿大学林楠教授：Sure independence screening for mediation analysis对中介分析进行独立筛选