南京审计大学吕绍高副教授:Debiased distributed learning for sparse partial linear models in high dimensions-统计研究中心

统计研究中心

当前位置：首页 > 系列讲座 > 正文

南京审计大学吕绍高副教授:Debiased distributed learning for sparse partial linear models in high dimensions

主题：Debiased distributed learning for sparse partial linear models in high dimensions

主讲人：南京审计大学吕绍高副教授

主持人：统计学院统计研究中心林华珍教授

时间：2019年11月26日（星期二）下午4:00-5:00

地点：西南财经大学柳林校区弘远楼408会议室

主办单位：统计研究中心统计学院科研处

主讲人简介：

吕绍高，现为南京审计大学副教授。2011年获得中国科大-香港城市大学联合培养博士，2011年-2018年在西南财经大学工作。主要研究方向是统计机器学习，当前研究兴趣包括分布式学习、随机算法的统计推断以及深度学习的理论分析等。迄今为止在SCI检索的杂志上发表论文20多篇，包括知名期刊《Annals of Statistics》,《Journal of Machine Learning Research》与《Journal of Econometrics》。主持国家自然科学基金项目3项。长期担任人工智能顶级会议“NeurIPS”、“ICML”、“AAAI”以及“AIStat”程序委员或审稿人。

主要内容：

Although various distributed machine learning schemes have been proposed recently for purely linear models and fully nonparametric models, little attention has been paid on distributed optimization for semi-parametric models with multiple structures (e.g. sparsity, linearity and nonlinearity). To address these issues, the current paper proposes a new communication-eﬃcient distributed learning algorithm for sparse partially linear models with an increasing number of features. The proposed method is based on the classical divide and conquer strategy for handling big data and the computation on each subsample consists of a debiased estimation of the double-regularized least squares approach. With the proposed method, we theoretically prove that our global parametric estimator can achieve optimal parametric rate in our semi-parametric model given an appropriate partition on the total data. Specifically, the choice of data partition relies on the underlying smoothness of the nonparametric component, and it is adaptive to the sparsity parameter. Finally, some simulated experiments are carried out to illustrate the empirical performances of our debiased technique under the distributed setting.

虽然近年来针对纯线性模型和全非参数模型提出了大量的分布式机器学习方案，但对于具有多个结构(如稀疏性、线性和非线性)的半参数模型的分布式优化问题却鲜有关注。为了解决这些问题，本文提出了一种新的通信效率的分布式学习算法来解决具有越来越多特征的稀疏部分线性模型。该方法基于经典的大数据分治策略，每个子样本的计算由双正则最小二乘法的无偏估计组成。利用所提出的方法，我们从理论上证明了我们的全局参数估计器可以在半参数模型中获得最优的参数率。具体来说，数据划分的选择依赖于非参数分量的底层平滑性，并且对稀疏性参数具有自适应能力。最后，通过模拟实验验证了该方法在分布式环境下的实验结果。

上一条：中国科学院张新雨研究员：Optimal parameter-transfer learning by semiparametric model averaging

下一条：英国伯明翰大学雷云文博士：基于随机梯度下降的统计学习