• 统计研究中心
当前位置: 首页 > 系列讲座 > 正文

中国科学技术大学郑泽敏教授:Statistically Guided Divide-and-Conquer for Sparse Factorization of Large Matrix

光华讲坛——社会名流与企业家论坛第 期

主 题Statistically Guided Divide-and-Conquer for Sparse Factorization of Large Matrix

主讲人中国科学技术大学郑泽敏教授

主持人统计学院林华珍教授

时间:2023年3月15日(周三)上午10:30-11:30

报告地点:腾讯会议,289-443-584

主办单位:统计研究中心和统计学院 科研处

主讲人简介:

郑泽敏,2015年于美国南加州大学获得应用数学博士学位,现为中国科学技术大学管理学院教授、统计与金融系主任、博士生导师,其研究方向是高维统计推断和大数据问题。郑泽敏博士在横跨这一领域的若干关键研究课题上取得了富有创造性的研究成果,研究成果发表在Journal of the Royal Statistical Society: Series B (JRSSB )、Annals of Statistics (AOS) 、Operations Research(OR) 、Journal of Machine LearningResearch(JMLR) 、Journal of Business & Economic Statistics (JBES) 等国际统计学、机器学习、计量经济学及管理优化领域的顶级期刊上。曾获南加州大学授予的优秀科研奖和美国数理统计协会颁发的科研新人奖,并于2017年入选中组部青年创新人才计划。


内容简介

The sparse factorization of a large matrix is fundamental in modern statistical learning. In particular, the sparse singular value decomposition and its variants have been utilized in multivariate regression, factor analysis, biclustering, vector time series modeling, among others. The appeal of this factorization is owing to its power in discovering a highly-interpretable latent association network, either between samples and variables or between responses and predictors. However, many existing methods are either ad hoc without a general performance guarantee, or are computationally intensive, rendering them unsuitable for large-scale studies. We formulate the statistical problem as a sparse factor regression and tackle it with a divide-and-conquer approach. In the first stage of division, we consider both sequential and parallel approaches for simplifying the task into a set of co-sparse unit-rank estimation (CURE) problems, and establish the statistical underpinnings of these commonly-adopted and yet poorly understood deflation methods. In the second stage of division, we innovate a contended stagewise learning technique, consisting of a sequence of simple incremental updates, to efficiently trace out the whole solution paths of CURE. Our algorithm has a much lower computational complexity than alternating convex search, and the choice of the step size enables a flexible and principled tradeoff between statistical accuracy and computational efficiency. Our work is among the first to enable stagewise learning for non-convex problems, and the idea can be applicable in many multi-convex problems. Extensive simulation studies and an application in genetics demonstrate the effectiveness and scalability of our approach.

大型矩阵的稀疏分解是现代统计学习的基础。特别是,稀疏奇异值分解及其变体已被用于多元回归、因子分析、聚类、向量时间序列建模等。这种因式分解的吸引力在于它能够发现一个高度可解释的潜在关联网络,无论是在样本和变量之间,还是在响应和预测因子之间。然而,许多现有的方法要么是临时的,没有一般的性能保证,要么是计算量大,使它们不适合大规模的研究。主讲人将统计问题表述为稀疏因子回归,并用分而治之的方法解决它。在分解问题的第一阶段,主讲人考虑使用序贯和并行两种方法来将任务简化为一组秩为一的联合稀疏估计(CURE)问题,并建立了这些常用的、但理解不足的降阶方法的统计基础。在划分 分解问题的第二阶段,主讲人创新了一种竞争性的分阶段算法,由一系列简单的增量更新组成,以有效地跟踪出CURE的所有解的路径。主讲人的算法具有比交替凸搜索低得多的计算复杂度,并且步长的选择能够在统计精度和计算效率之间实现灵活和有依据的权衡。主讲人的工作是首批实现非凸问题的分阶段学习的工作之一,并且该思想可以应用于许多多凸问题。广泛的模拟研究和在遗传学中的应用证明了我们的方法的有效性和可扩展性。



上一条:英国约克大学 张文扬教授:Nonparametric Homogeneity Pursuit in Functional-Coefficient Models

下一条:英国约克大学 张文扬教授:Nonparametric Homogeneity Pursuit in Functional-Coefficient Models