南加州大学 Jinchi Lv教授: "RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs"-统计研究中心

统计研究中心

当前位置：首页 > 系列讲座 > 正文

南加州大学 Jinchi Lv教授: "RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs"

光华讲坛——社会名流与企业家论坛第5482期

主题: "RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs"

主讲人：南加州大学 Jinchi Lv教授

主持人：郭斌副教授

时间：2019年7月5日（星期五）上午9:00-10:00

地点：西南财经大学柳林校区弘远楼408会议室

主办单位：统计研究中心统计学院科研处

主讲人简介：

Jinchi Lv is Kenneth King Stonier Chair in Business Administration and Professor in Data Sciences and Operations Department of the Marshall School of Business at the University of Southern California, Professor in Department of Mathematics at USC, and an Associate Fellow of USC Dornsife Institute for New Economic Thinking (INET). He received his Ph.D. in Mathematics from Princeton University in 2007. He was McAlister Associate Professor in Business Administration at USC from 2016-2019. His research interests include statistics, machine learning, data science, and business applications.

His papers have been published in journals in statistics, economics, computer science, information theory, and biology, and one of them was published as a Discussion Paper in Journal of the Royal Statistical Society Series B (2008). He is the recipient of Fellow of Institute of Mathematical Statistics (2019), USC Marshall Dean's Award for Research Impact (2017), Adobe Data Science Research Award (2017), the Royal Statistical Society Guy Medal in Bronze (2015), NSF Faculty Early Career Development (CAREER) Award (2010), USC Marshall Dean's Award for Research Excellence (2009), and Zumberge Individual Award from USC's James H. Zumberge Faculty Research and Innovation Fund (2008). He has served as an associate editor of the Annals of Statistics (2013-2018), Journal of Business & Economic Statistics (2018-present), and Statistica Sinica (2008-2016).

吕金翅是美国南加州大学Marshall商学院数据科学与运营系教授，美国南加州大学商学院数据科学与运营系教授，美国南加州大学数学系教授，南加州大学多恩西夫新经济思维研究所(INET)副研究员。2007年，他在普林斯顿大学获得数学博士学位。2016年至2019年，他在南加州大学担任工商管理副教授。他的研究兴趣包括统计学、机器学习、数据科学和商业应用。他的论文发表在统计学，经济学，计算机科学，信息理论和生物学等期刊上，其中一篇论文发表在“Journal of the Royal Statistical Society Series B”（2008）的讨论论文中。他2019年Institute of Mathematical Statistics的Follow，2017年荣获南加州大学马歇尔院长研究影响奖和Adobe数据科学研究奖，2015年荣获英国皇家统计学会的Guy Medal in Bronze，2010年取得NSF早期职业发展（CAREER）奖，2019年获得南加州大学 Marshall颁发的卓越研究奖等。他曾担任the Annals of Statistics（2013-2018），Journal of Business & Economic Statistics（2018年至今）和Statistica Sinica（2008-2016）的副主编。

主要内容：

Power and reproducibility are key to enabling refined scientific discoveries in contemporary big data applications with general high-dimensional nonlinear models. In this paper, we provide theoretical foundations on the power and robustness for the model-X knockoffs procedure introduced recently in Candès, Fan, Janson and Lv (2018) in high-dimensional setting when the covariate distribution is characterized by Gaussian graphical model. We establish that under mild regularity conditions, the power of the oracle knockoffs procedure with known covariate distribution in high-dimensional linear models is asymptotically one as sample size goes to infinity. When moving away from the ideal case, we suggest the modified model-X knockoffs method called graphical nonlinear knockoffs (RANK) to accommodate the unknown covariate distribution. We provide theoretical justifications on the robustness of our modified procedure by showing that the false discovery rate (FDR) is asymptotically controlled at the target level and the power is asymptotically one with the estimated covariate distribution. To the best of our knowledge, this is the first formal theoretical result on the power for the knockoffs procedure. Simulation results demonstrate that compared to existing approaches, our method performs competitively in both FDR control and power. A real data set is analyzed to further assess the performance of the suggested knockoffs procedure. This is a joint work with Emre Demirkaya, Yingying Fan and Gaorong Li.

在现代大数据应用中，对于一般的高维非线性模型，势能和再现性是实现精准科学发现的关键。对于Candès, Fan, Janson and Lv (2018)等人在高维设定下引入的X仿形过程，本文在X的协方差分布为高斯图模型的情况下，为仿形过程提供了势能和稳健性的理论基础。我们证明了，在适度的正则条件下，当样本量趋于无穷大时，高维线性模型中已知协变量分布的Oracle仿形过程的势能是渐近于1。当远离理想情况时，或当协变量分布未知时，我们建议使用改进的x仿形方法，称为图形非线性仿形（RANK）。我们证明了RANK的假阳性率（FDR）可以渐近控制在目标水平以下，并且势能是渐近趋于1，并给出协变量分布的估计，从而提供了稳健型的理论依据。据我们所知，这是基于仿形过程的势能的第一个正式的理论结果。模拟结果表明，与现有方法相比，我们的方法在假阳性率（FDR）控制和势能控制方面都具有很强的竞争力。对于真实的数据，我们给出了对RANK的进一步评估建议。这是与Emre Demirkaya, Yingying Fan and Gaorong Li.的联合工作。

上一条：中国人民大学刘勇副教授：无监督泛化误差分析初探

下一条：宾夕法尼亚州立大学马彦源教授：Measurement Error Models