光华讲坛——社会名流与企业家论坛第 5665 期
主题:A race-DC in Big Data
主讲人:山东大学林路教授
主持人:统计学院统计研究中心 林华珍教授
时间:2019年12月13日(星期五)下午4:00-5:00
地点:西南财经大学柳林校区弘远楼408会议室
主办单位:统计研究中心 统计学院 科研处
主讲人简介:
林路是山东大学金融研究院教授、博士生导师、副院长;在南开大学获得博士学位后,先在南开大学任教,然后到山东大学任教至今;从事高维统计、非参数和半参数统计以及金融统计等方向的研究,在国际统计学、机器学习和相关应用学科顶级期刊Annals of Statistics, Journal of Machine Learning Research和其它重要期刊发表研究论文100余篇;主持过多项国家自然科学基金课题、博士点专项基金课题、山东省自然科学基金重点项目等;获得国家统计局颁发的统计科技进步一等和二等奖,山东省优秀教学成果一等奖;是国家973项目、国家创新群体和教育部创新团队的核心成员,教育部应用统计专业硕士教育指导委员会成员,山东省政府参事。
主要内容:
The strategy of divide-and-combine (DC) has been widely used in the area of big data. Bias-correction is crucial in the DC procedure for efficiently aggregating the locally biased estimators, especial for the case when the number of batches of data is large. This paper establishes a race-DC via residual-adjustment composition estimate (race). The race-DC applies to various types of biased estimators, which include but are not limited to Lasso estimator, Bridge estimator and principal component estimator in linear regression, and least squares estimator in nonlinear regression. The resulting global estimator is strictly unbiased under linear model, and is acceleratingly bias-reduced in nonlinear model, and can achieve the theoretical optimality, for the case when the number of batches of data is large. Moreover, the race-DC is computationally simple because it is a least squares estimator in a pro forma linear regression. Detailed simulation studies demonstrate that the resulting global is significantly bias-corrected, and the behavior is comparable with the oracle estimation and is much better than the competitors.