• 统计研究中心
当前位置: 首页 > 系列讲座 > 正文

普林斯顿大学竺紫威博士:Distributed Statistical Learning via Refitting Bootstrap Samples

光华讲坛——社会名流与企业家论坛第 期

主 题Distributed Statistical Learning via Refitting Bootstrap Samples

主讲人普林斯顿大学竺紫威博士

主持人统计学院林华珍教授

时间:2023年4月14日(周五)下午14:00-15:00

举办地点:柳林校区弘远楼408会议室

主办单位:统计研究中心和统计学院 科研处

主讲人简介:

Dr. Ziwei Zhu was an assistant professor of statistics at the University of Michigan, Ann Arbor (UMich) from 2019 to 2022. Prior to UMich, He was a research associate at the Statistical Laboratory at the University of Cambridge from 2018 to 2019, hosted by Professor Richard J. Samworth. He obtained his Ph.D. degree from the Department of Operations Research and Financial Engineering (ORFE) at Princeton University, advised by Professor Jianqing Fan. His research interests are federated / distributed statistical learning, high-dimensional statistics, robust statistics and missing data.

竺紫威博士于2019年至2022年在密歇根大学安娜堡分校(UMich)担任统计学助理教授。在加入密歇根大学之前,他于2018年至2019年在剑桥大学统计实验室担任研究助理,这个实验室是由Richard J. Samworth教授主持的。他从美国普林斯顿大学运筹学与金融工程系获得博士学位,师从范剑青教授。他的主要研究方向为联邦/分布式统计学习、高维统计、稳健统计和缺失数据。


内容简介

In this talk, I will introduce a one-shot distributed learning algorithm via refitting Bootstrap samples, which we refer to as ReBoot. Given the local models that are fit on multiple independent subsamples, ReBoot refits a new model on the union of the Bootstrap samples drawn from these local models. The whole procedure requires only one round of communication of model parameters. Theoretically, we analyze the statistical rate of ReBoot for generalized linear models (GLM) and noisy phase retrieval, which represent convex and non-convex problems respectively. In both cases, ReBoot provably achieves the full-sample statistical rate whenever the subsample size is not too small. In particular, we show that the systematic bias of ReBoot, i.e., the error that is independent of the number of subsamples, is O(n ^ -2) in GLM, where n is the subsample size. This rate is sharper than that of model parameter averaging and its variants, implying the higher tolerance of ReBoot with respect to data splits to maintain the full-sample rate. Simulation study demonstrates the statistical advantage of ReBoot over competing methods including averaging and CSL (Communication-efficient Surrogate Likelihood) with one round of gradient communication. Finally, we propose FedReBoot, an iterative version of ReBoot, to aggregate convolutional neural networks for image classification, which exhibits substantial superiority over FedAvg within early rounds of communication.

在这次演讲中,主讲人将介绍一种通过改装Bootstrap样本的一次性分布式学习算法,我们称之为ReBoot。给定拟合多个独立子样本的局部模型,ReBoot在从这些局部模型中提取的Bootstrap样本的并集上改装一个新的模型。整个过程只需要一轮模型参数的沟通。从理论上分析了分别代表凸问题和非凸问题的广义线性模型(GLM)和噪声相位检索的ReBoot统计率。在这两种情况下,只要子样本容量不太小,ReBoot均可证明达到全样本统计率。特别地,主讲人证明了ReBoot的系统偏差,即与子样本数量无关的误差,在GLM中为O(n ^ -2),其中n为子样本容量。该速率比模型参数平均及其变体的速率更清晰,这意味着为了保持全样本速率,对于数据分割的ReBoot有更高的容错度。仿真研究表明,在一轮梯度通信条件下,ReBoot算法在统计上优于平均法和CSL (communication -efficient Surrogate Likelihood,有效替代似然法)。最后,主讲人提出了FedReBoot (ReBoot的迭代版本)来聚合卷积神经网络进行图像分类,在早期的通信回合中,它比FedAvg表现出了显著的优势。


上一条:浙江大学崔逸凡研究员:Some Recent Progress in Proximal Causal Learning

下一条:中国人民大学郭绍俊副教授: How Asymptotics Meets Application:Better Nonparametric Confidence Intervals for Quantile Regression