主 题:Network modeling and Goodness-of-Fit网络建模及其拟合优度
主讲人:卡内基梅隆大学Jiashun Jin教授
主持人:统计学院林华珍教授
时间:2024年10月11日(周五)下午4:00-5:00
举办地点:柳林校区弘远楼408会议室
主办单位:统计研究中心和统计学院 科研处
主讲人简介:
Jiashun Jin is Professor in Statistics & Data Science and Affiliated Professor in Machine Learning at Carnegie Mellon University. His earlier work was on the analysis of Rare/Weak signals in big data, focusing on the development of (Tukey’s) Higher Criticism and practical False Discovery Rate (FDR) controlling methods. His more recent interest is on the analysis of complex network and text data, where he has led a team collecting a large-scale data set on statistical publications called the MADStat. In these areas, Jin has co-authored three Editor’s Invited Discussion papers and three Editor’s Invited Review papers.
Jin is an elected IMS fellow and an elected ASA fellow, and he has delivered the highly selective IMS Medallion Lecture in 2015 and IMS AoAS (Annals of Applied Statistics) Lecture in 2016. He was also a recipient of the NSF CAREER award and the IMS Tweedie Award. He has served as Associate Editor for several statistical journals and he is currently severing IMS as the IMS Treasurer. Beyond his academic career, Jin has also gained valuable experience in industry by doing research at Two-Sigma Investments and Google LLC.
Jiashun Jin是卡内基梅隆大学统计与数据科学教授,并兼任机器学习附属教授。他早期的研究集中在大数据中稀有信号的分析,特别是开发(Tukey's)高阶批判和实际的虚假发现率(FDR)控制方法。最近,他的兴趣转向复杂网络和文本数据的分析,并领导团队收集了一个名为MADStat的大规模统计出版物数据集。在这些领域,Jin教授与他人共同撰写了三篇编辑邀请的讨论论文和三篇编辑邀请的评论论文。
Jin教授是国际数理统计学会(IMS)和美国统计协会(ASA)选举产生的会士。他于2015年发表备受瞩目的IMS Medallion Lecture,并于2016年发表IMS AoAS演讲。他还获得了国家科学基金会(NSF)CAREER奖和IMS Tweedie奖。他曾担任多个统计期刊的副主编,目前担任IMS的财务主管。除了学术事业,他还在行业中积累了宝贵的经验,曾在Two Sigma Investments和Google LLC进行研究。
内容简介:
The block-model family has four popular network models: SBM, MMSBM, DCBM, and DCMM. A fundamental problem is, how well each of these models fits with real networks. We propose GoF-MSCORE as a new Goodness-of-Fit (GoF) metric for DCMM (the broadest one among the four), with two main ideas. The first is to use cycle count statistics as a general recipe for GoF. The second is a novel network fitting scheme. GoF-MSCORE is a flexible GoF approach. We adapt it to all four models in the block-model family.
We show that for each of the four models, if the assumed model is correct, then the corresponding GoF metric converges to N(0,1) as the network sizes diverge. We also analyze the powers and show that these metrics are optimal in many settings. For 12 real networks, we use the proposed GoF metrics to show that DCMM fits well with almost all of them. We also show that SBM, DCBM, and MMSBM do not fit well with many of these networks, especially when the networks are relatively large. Together with the mathematical tractability of the block-model family, these suggest that DCMM is a possible (or is close to the) sweet-spot for network modeling.
块模型家族拥有四种流行的网络模型:SBM、MMSBM、DCBM 以及 DCMM。一个根本问题在于,这些模型中的每一个与真实网络的契合程度究竟如何。主讲人提出 GoF-MSCORE 作为 DCMM(上述四种模型中涵盖范围最广的一种)的一种新的拟合优度(Goodness-of-Fit,GoF)度量标准,其中包含两个主要理念。其一,将循环计数统计作为拟合优度的通用方法。其二,是一种新颖的网络拟合方案。GoF-MSCORE 是一种灵活的拟合优度方法。主讲人将其应用于块模型家族中的所有四种模型。
主讲人表明,对于这四种模型中的每一种,如果假定的模型是正确的,那么随着网络规模的增大,相应的拟合优度度量将收敛于 N(0, 1)。主讲人还分析其功效,并表明这些度量在许多情况下都是最优的。对于 12 个真实网络,主讲人使用所提出的拟合优度度量来表明 DCMM 与几乎所有这些网络都契合良好。还表明,SBM、DCBM 和 MMSBM 与其中许多网络的契合度不佳,尤其是当网络相对较大时。结合块模型家族的数学易处理性,这些都表明 DCMM 有可能(或者接近)是网络建模的最佳选择。