• 统计研究中心
当前位置: 首页 > 系列讲座 > 正文

美国康涅狄格大学王海鹰副教授:Subsampling for Rare Events Data and maximum sampled conditional likelihood

主 题Subsampling for Rare Events Data and maximum sampled conditional likelihood罕见事件数据的下采样和最大抽样条件似然





主办单位:统计研究中心和统计学院 科研处


HaiYing Wang is an Associate Professor in the Department of Statistics at the University of Connecticut. His research interests include informative subdata selection for big data, model selection, model averaging, measurement error models, and semi-parametric regression. His research has been published in top statistics and machine learning journals (e.g., Biometrika, IEEE Transactions on Information Theory, JASA, and JMLR) and conferences (e.g., ICML and NeurIPS).

HaiYing Wang,美国康涅狄格大学统计系副教授。主要研究方向为大数据信息性子数据选择、模型选择、模型平均、测量误差模型、半参数回归等。他的研究已发表在顶级统计和机器学习期刊(如Biometrika, IEEE Transactions on Information Theory, JASA和JMLR)和会议期刊(如ICML和NeurIPS)上。


In this talk, we show that the available information about unknown parameters in rare events data is only tied to the relatively small number of cases, which justifies the usage of negative sampling. However, if the negative instances are subsampled to the same level of the positive cases, there is information loss. To maintain more information, we derive an optimal sampling probability for the inverse probability weighted (IPW) estimator. We further we propose a likelihood-based estimator to further improve the estimation efficiency, and show that the improved estimator has the smallest asymptotic variance among a large class of estimators. It is also more robust to pilot misspecification. The likelihood-based estimator is also generalized to a class of models beyond binary response models. We validate our approach on simulated data, the MNIST data, and a real click-through rate dataset with more than 0.3 trillion instances.


上一条:明尼苏达大学双城分校王刚华博士生:Pruning deep neural networks from a sparsity perspective

下一条:美国加州大学洛杉矶分校(UCLA) 李刚教授:A new joint model of a longitudinal outcome and a competing risks time-to-event outcome