伊利诺伊大学香槟分校梁枫教授:Learning Topic Models: Identifiability and Finite-Sample Analysis-统计研究中心

统计研究中心

当前位置：首页 > 系列讲座 > 正文

伊利诺伊大学香槟分校梁枫教授:Learning Topic Models: Identifiability and Finite-Sample Analysis

光华讲坛——社会名流与企业家论坛第期

主题：Learning Topic Models: Identifiability and Finite-Sample Analysis

主讲人：伊利诺伊大学香槟分校梁枫教授

主持人：统计学院林华珍教授

时间：2023年3月30日（周四）下午13:00-14:00

举办地点：柳林校区弘远楼408会议室

主办单位：统计研究中心和统计学院科研处

主讲人简介：

梁枫，伊利诺伊大学香槟分校统计学系教授。她于2002年5月获得耶鲁大学统计学系博士学位；在此之前，她从北京大学取得数学学士学位。她的研究兴趣包括贝叶斯统计，高纬数据分析，和信息论。

内容简介：

Topic models provide a useful text-mining tool for learning, extracting, and discovering latent structures in large text corpora. Although a plethora of methods have been proposed for topic modeling, a formal theoretical investigation on the statistical identifiability and accuracy of latent topic estimation is lacking in the literature. In this work, we propose a maximum likelihood estimator (MLE) of latent topics based on a specific integrated likelihood, which is naturally connected to the concept of volume minimization in computational geometry. Theoretically, we introduce a new set of geometric conditions for topic model identifiability, which are weaker than conventional separability conditions relying on the existence of anchor words or pure topic documents. We conduct finite-sample error analysis for the proposed estimator and discuss the connection of our results with existing ones. We conclude with empirical studies on both simulated and real datasets. This talk is based on joint work with Yinyin Chen, Shishuang He, and Yun Yang.

上一条：厦门大学钟威教授：Multi-Kink Quantile Regression for Longitudinal Data with Application to Progesterone Data Analysis

下一条：密西根大学何旭铭教授：How Good is Your Best Selected Subgroup（你选到的最好亚组到底有多好）？