主题: Statistical Inference in High-Dimensional Problems高维问题中的统计推断
主讲人:罗格斯大学Cun-Hui Zhang教授
主持人: 统计学院林华珍教授
时间:2023年10月20日(周五)下午4-5点
举办地点:柳林校区弘远楼408会议室
主办单位: 统计学院 国际交流与合作处 科研处
主讲人简介:
Cun-Hui Zhang, Distinguished Professor of Statistics at Rutgers University, is a Fellow of the Institute of Mathematical Statistics and a Fellow of American Statistical Association. His research interests include high-dimensional data, empirical Bayes, time series, nonparametric methods, multivariate analysis, survival data and biostatistics, functional MRI, closed loop diabetes control, and network tomography.
Cun-Hui Zhang,美国罗格斯大学统计学特聘教授,美国数理统计研究所(IMS)Fellow、美国统计协会(ASA) Fellow。他的研究兴趣包括高维数据、经验贝叶斯、时间序列、非参数方法、多元分析、生存数据和生物统计学、功能性MRI、糖尿病控制闭环和网络诊断。
内容简介:
We provide necessary and sufficient conditions for the chi-squared and normal approximations of Pearson's chi-squared statistics for the test of independence and the goodness-of-fit test, as well as necessary and sufficient conditions for the normal approximation of the likelihood ratio and Hellinger statistics, when the cell probabilities of the multinomial data are in general pattern and the dimension diverges with the sample size. This theory allows a majority of cells to have zero count. A cross-sample chi-squared statistic for testing independence applies to two-way contingency tables with diverging dimensions. A degrees-of-freedom adjusted chi-squared approximation applies continuously throughout the high-dimensional regime and matches Pearson's chi-squared statistic in both the mean and variance. Specific examples are provided to demonstrate the validity of the chi-squared and normal approximations for the three types of test statistics when the classic regularity conditions and guidelines are violated. Simulation results demonstrate that the chi-squared and normal approximations are more robust for the likelihood ratio and Hellinger statistics, compared with Pearson's chi-squared statistics. This part of the talk is based on joint work with Chong Wu and Yisha Yao.
In a second problem we study the estimation of a general function of a high-dimensional mean vector. The key element of our approach is a new method which we call High-Order Degenerate Statistical Expansion. It leverages the use of classical multivariate Taylor expansion and degenerate U-statistic and yields an elegant explicit formula. The formula expresses the error of the proposed estimator as the sum of a Taylor-Hoeffding series and an explicit remainder term in the form of the Riemann-Liouville integral as in the Taylor expansion. The Taylor-Hoeffding series replaces the power of the average noise in the classical Taylor series by its degenerate version to give a Hoeffding decomposition as a weighted sum of degenerate U-products of the noises. This makes the proposed method a natural statistical version of the classical Taylor expansion. The proposed estimator can be viewed as a jackknife estimator of the Taylor-Hoeffding series and can be approximated by bootstrap. Thus, the jackknife, bootstrap and Taylor expansion approaches all converge to the proposed estimator. We develop risk bounds for the proposed estimator under proper moment conditions and a central limit theorem under a second moment condition (even in expansions of higher than the second order). We apply this new method to several smooth and non-smooth function under minimum moment constraints. This part of the talk is based on joint work with Fan Zhou and Ping Li.
当多项数据的单元格概率呈一般模式且维数随样本量的增大而发散时,主讲人给出了独立性检验和拟合优度检验的Pearson卡方统计量的卡方逼近和正态逼近的充分必要条件,以及似然比和Hellinger统计量的正态逼近的充分必要条件。这个理论允许大多数单元格计数为零。用于检验独立性的跨样本卡方统计量适用于具有发散维数的双向列联表。根据自由度调整的卡方近似可连续应用于整个高维区域,并在均值和方差上和皮尔逊卡方统计量相匹配。举例说明了在不符合经典正则性条件和准则的情况下,三种检验统计量的卡方近似和正态近似的有效性。仿真结果表明,与Pearson的卡方统计量相比,卡方和正态近似与似然比和海灵格统计量相比具有更强的鲁棒性,。这部分的报告是基于与Chong Wu和Yisha Yao的联合工作。
在第二个问题中,主讲人研究了高维平均向量的一般函数估计。主讲人的方法的关键点是一种新的方法,主讲人称之为高阶退化统计展开。它利用经典的多元泰勒展开和退化的U统计量,并产生一个优雅的显式公式。该公式将所提出的估计量的误差表示为Taylor-Hoeffding级数和显式的、形式为Riemann-Liouville积分的剩余项,其和Taylor展开式中的一样。Taylor-Hoeffding级数将经典Taylor级数中的平均噪声的幂替换为其退化版本,从而给出一种Hoeffding分解,形式为噪声的退化U积的加权和。这使得所提出的方法成为经典泰勒展开的自然统计版本。所提出的估计量可以看作是Taylor-Hoeffding级数的Jackknife估计量,可以用Boostrap方法逼近。因此,Jackknife、Boostrap和泰勒展开法都收敛于所提出的估计量。主讲人给出了所提估计量在固有矩条件下的风险界和二阶矩条件下(甚至在高于二阶的展开式下)的中心极限定理。将该方法应用于最小矩约束下的光滑函数和非光滑函数场景。这一部分的报告是基于与Fan Zhou和Ping Li的共同工作。