光华讲坛——社会名流与企业家论坛第 5585 期
主题:BOLT-SSI: Fully Screening Interaction Effects for Ultra-High Dimensional Data
主持人:统计学院统计研究中心 林华珍教授
主办单位:统计研究中心 统计学院 科研处
彭衡,现为香港浸会大学数学系副教授,2003年从香港中文大学取得统计学博士学位,2003年-2006年在普林斯顿大学做博士后。他主要从事非参数与半参数模型、模型选择、高维数据建模、混合模型等领域的研究。他是IMS的会员,2011-2014担任Statistica Sinica副主编,现为Computational Statistics and Data Analysis副主编;曾做过Annals,JASA,JRSSB, Biometrika, Statistica Sinica等的评审。在统计学国际顶级期刊Annals,JASA, Statistica Sinica,TEST,Computational Statistics and Data Analysis上发表论文十余篇。
Detecting interaction effects among predict variables to response variables is often an crucial step in regression modeling of real data for various applications. In this paper by marginal likelihood functions, we firstly introduce a simple sure screening procedure (SSI) to fully detect significant pure interaction between predict variables and the response variable in the high or ultra-high dimensional generalized linear regression models. Furthermore, we suggest to discretize continuous predict variables, and utilize the Boolean operation for the marginal likelihood estimates. The so called BOLT-SSI procedure is proposed to accelerate the sure screening speed of the procedure. We investigate the sure screening properties of SSI and BOLT-SSI. Our studies have several important features. First, to most ultra-high dimensional data in practice, the proposed sure screening methods can fully detect any pure interaction effects among ultra-high dimensional data. It is an impossible finished task from theoretical insight. Second, the proposed method efficiently takes the advantages of computer architecture to speed up the proposed algorithm and make trade-off between the computation burden and statistical modeling efficiency. Specially, regarding the interaction effect detecting study as a special example, by this study we show the limitation of theoretical investigation from the practical insight, and illustrate that how to make trade-off between engineering techniques and theoretical investigations.