• 统计研究中心
当前位置: 首页 > 系列讲座 > 正文

香港浸会大学彭衡副教授:BOLT-SSI: Fully Screening Interaction Effects for Ultra-High Dimensional Data

光华讲坛——社会名流与企业家论坛第 5585 期


主题:BOLT-SSI: Fully Screening Interaction Effects for Ultra-High Dimensional Data

主讲人:香港浸会大学彭衡副教授

主持人:统计学院统计研究中心 林华珍教授

时间2019年11月07日(星期四)上午11:00-12:00

地点:西南财经大学柳林校区弘远楼408会议室

主办单位:统计研究中心 统计学院 科研处


主讲人简介:

   彭衡,现为香港浸会大学数学系副教授,2003年从香港中文大学取得统计学博士学位,2003年-2006年在普林斯顿大学做博士后。他主要从事非参数与半参数模型、模型选择、高维数据建模、混合模型等领域的研究。他是IMS的会员,2011-2014担任Statistica Sinica副主编,现为Computational Statistics and Data Analysis副主编;曾做过Annals,JASA,JRSSB, Biometrika, Statistica Sinica等的评审。在统计学国际顶级期刊Annals,JASA, Statistica Sinica,TEST,Computational Statistics and Data Analysis上发表论文十余篇。

主要内容:

Detecting interaction effects among predict variables to response variables is often an crucial step in regression modeling of real data for various applications. In this paper by marginal likelihood functions, we firstly introduce a simple sure screening procedure (SSI) to fully detect significant pure interaction between predict variables and the response variable in the high or ultra-high dimensional generalized linear regression models. Furthermore, we suggest to discretize continuous predict variables, and utilize the Boolean operation for the marginal likelihood estimates. The so called BOLT-SSI procedure is proposed to accelerate the sure screening speed of the procedure.  We investigate the sure screening properties of SSI and BOLT-SSI.   Our studies have several important features. First, to most ultra-high dimensional data in practice, the proposed sure screening methods can fully detect any pure interaction effects among ultra-high dimensional data. It is an impossible finished task from theoretical insight. Second, the proposed method efficiently takes the advantages of computer architecture to speed up the proposed algorithm and make trade-off between the computation burden and statistical modeling efficiency. Specially, regarding the interaction effect detecting study as a special example, by this study we show the limitation of theoretical investigation from the practical insight, and illustrate that how to make trade-off between engineering techniques and theoretical investigations.

在大量实际应用中,检测预测变量与响应变量之间的交互作用通常是回归建模中至关重要的一步。针对高维或超高维广义线性模型,本文基于边际似然函数,首先介绍一种简单的确定筛选算法(SSI),以完全检测预测变量和响应变量之间显著的纯交互作用。此外,我们建议离散化连续型预测变量,并将布尔运算用于边际似然估计。本文提出了BOLT-SSI算法以加快确定筛选的速度,同时考查了SSIBOLT-SSI的确定筛选性质。我们的研究有几个重要特征。首先,对于实际中的大多数超高维数据,所提出的确定筛选方法可以完全地检测超高维数据之间的任何纯交互作用,从理论上看这是不可能完成的任务。其次,所提出的方法有效地利用了计算机体系结构的优势来加快算法的运算速度,并在庞大的计算和统计建模功效之间进行平衡。特别地,以交互作用检测研究为例,本报告从实践的角度揭示理论研究所具有的局限性,并说明如何在工程技术和理论研究之间进行权衡。


上一条:北京大学陈松蹊教授:Meteorological Change and Impacts on Air Pollution -- Results from North China

下一条:南京审计大学吕绍高教授:Nonparametric optimality for large compressible deep neural networks under quadratic loss functions