光华讲坛——社会名流与企业家论坛第5481期
主题:An Empirical Bayes Solution for Selection Bias in Functional Data
主讲人:南加州大学 Yingying Fan教授
主持人:郭斌 副教授
时间:2019年7月5日(星期五)上午10:00-11:00
地点:西南财经大学柳林校区弘远楼408会议室
主办单位:统计研究中心 统计学院 科研处
主讲人简介:
Yingying Fan is Dean's Associate Professor in Business Administration in Data Sciences and Operations Department of the Marshall School of Business at the University of Southern California, Associate Professor in Departments of Economics and Computer Science at USC, and an Associate Fellow of USC Dornsife Institute for New Economic Thinking (INET). She received her Ph.D. in Operations Research and Financial Engineering from Princeton University in 2007. She was Lecturer in the Department of Statistics at Harvard University from 2007-2008. Her research interests include statistics, data science, machine learning, economics, and big data and business applications. |
Yingying Fan是南加州大学 Marshall 商学院数据科学与运行系工商管理副院长,南加州大学经济与计算机科学系副教授、南加州大学多恩西夫新经济思维研究所(INET)副研究员。2007年,她在普林斯顿大学获得了运筹学和金融工程学博士学位。2007-2008年,她于哈佛大学担任统计学系的讲师。她的研究方向包括统计学、数据科学、机器学习、经济学、大数据和商业应用。
主要内容:
Selection bias results from the sampling of extreme observations and is a well recognized issue for standard scalar or multivariate data. Numerous approaches have been proposed to address the issue, dating back at least as far as the James-Stein shrinkage estimator. However, the same potential issue arises, albeit with additional complications, for functional data. Given a set of observed functions, one may wish to select for further analysis those which are most extreme according to some metric such as the average, variance, or maximum value of the function. However, given that functions are often noisy realizations of some underlying mean process, these outliers are likely to generate biased estimates of the quantity of interest. In this paper we propose an empirical Bayes approach, using a variant of Tweedie's formula, to adjust such functional data to generate approximately unbiased estimates of the true mean functions. Our approach has several advantages. It is non-parametric in nature, but is capable of automatically shrinking back towards a James-Stein type estimator in low signal situations. It is also computationally e cient and possesses desirable theoretical properties. Furthermore, we demonstrate through extensive simulations and real data analyses that our approach can produce signi cant improvements in prediction accuracy relative to possible competitors.
选择偏差是极端观测的抽样结果,是标准标量或多元数据的一个公认问题。已经提出了许多方法来解决这个问题,至少可以追溯到James-Stein收缩估计量。然而,同样的潜在问题也出现在函数数据上,尽管存在额外的复杂性。给定一组观察到的函数数据,人们可能希望根据函数的平均值、方差或最大值等度量来选择最极端的函数进行进一步分析。然而,考虑到函数通常是一些底层平均过程的嘈杂实现,这些异常值很可能产生对感兴趣的估计量的有偏的估计。在本文中,我们提出了一种经验贝叶斯方法,使用Tweedie公式的一种变体,来调整这些函数数据,以生成真实均值函数的近似无偏估计。我们的方法有几个优点。它本质上是非参数的,但在低信号情况下能够自动收缩回James-Stein类型估计量。它具有计算效率高、理论性能好等优点。此外,我们通过大量的模拟和实际数据分析表明,相对于其他方法,我们的方法可以显著提高预测精度。