第四届机器学习与统计会议系列报告(一)

发布时间 | 2026年04月14日 文章来源 | 浏览次数 |

机器学习与统计学(MLSTAT)会议是由中国现场统计研究会机器学习分会主办的学术会议。会议旨在促进机器学习与统计领域的国内外学者进行学术交流,引领机器学习与统计共同交叉发展的学术文化,推动作为数据科学与人工智能的奠基性学科的进步,以此助力相关数字经济产业的发展。

第四届机器学习与统计学会议(MLSTAT2026)将于2026年7月15日-17日在西南财经大学(四川成都市)举行。会议将邀请20位左右青年学者就机器学习、人工智能、统计学和应用数学等相关领域的前沿进展做大会主题报告,同时欢迎在读博士生进行墙报展示。


  

主题:A Statistical Framework for Alignment with Biased AI Feedback

报告人简介:蔡占锐现任香港大学经管学院创新与信息管理系助理教授。于2021年于宾夕法尼亚州立大学获得统计学博士学位,并于之后在卡内基梅隆大学进行博士后研究,以及在爱荷华州立大学统计系担任助理教授。研究兴趣包括统计在大模型中的应用,差分隐私中的统计推断,以及机器学习在统计方法中的应用等。

报告摘要:Modern alignment pipelines are increasingly replacing expensive human preference labels with evaluations from large language models (LLM-as-Judge). However, AI labels can be systematically biased compared to high-quality human feedback datasets. In this paper, we develop two debiased alignment methods within a general framework that accommodates heterogeneous prompt-response distributions and external human feedback sources. Debiased Direct Preference Optimization (DDPO) augments standard DPO with a residual-based correction and density-ratio reweighting to mitigate systematic bias, while retaining DPO's computational efficiency. Debiased Identity Preference Optimization (DIPO) directly estimates human preference probabilities without imposing a parametric reward model. We provide theoretical guarantees for both methods: DDPO offers a practical and computationally efficient solution for large-scale alignment, whereas DIPO serves as a robust, statistically optimal alternative that attains the semiparametric efficiency bound. Empirical studies on sentiment generation, summarization, and single-turn dialogue demonstrate that the proposed methods substantially improve alignment efficiency and recover performance close to that of an oracle trained on fully human-labeled data.


主题:Optimizing Agentic Workflows: From Task-Level Search to Dynamic Construction

报告人简介:代忠祥博士现任香港中文大学(深圳)数据科学学院助理教授、博士生导师,入选国家级高层次青年人才,并获评校长青年学者。主持广东省自然科学基金优秀青年项目、国家自然科学基金青年科学基金项目(C类)、深圳市自然科学基金面上项目,以及华为大模型智能体合作项目等。于2024年在麻省理工学院(MIT)担任博士后研究员,并于2021年至2023年在新加坡国立大学(NUS)从事博士后研究。他于2015年和2021年分别获得新加坡国立大学的电气工程学士学位和计算机科学博士学位。研究兴趣涵盖机器学习的理论与应用。在应用层面,专注于大语言模型,研究方向包括基于大模型的智能体、大模型个性化、大模型在线路由、基于大模型的社会模拟、大模型提示优化等;在理论层面,深入研究多臂老虎机算法。已在顶级AI会议和期刊上发表论文38余篇,其中30余篇发表于NeurIPS、ICML和ICLR,并担任NeurIPS、ICML和ICLR的领域主席(Area Chair)。

报告摘要:The rapid evolution of large language model (LLM) agents has demonstrated their immense potential in complex reasoning tasks, yet relying on manually designed agentic workflows fundamentally limits their adaptability and performance. In this talk, we present a trajectory of our recent research that progressively automates and refines workflow construction, moving from static to highly dynamic paradigms. We begin by introducing a task-level optimization framework that leverages bandit-guided graph evolution to discover a single, robust workflow capable of generalizing across all queries within a specific task domain. To push beyond the inherent performance ceiling of such task-level generalization, we then present a query-level orchestration method that utilizes contextual bandits to adaptively assign the optimal, customized workflow for every individual input. Finally, we address the limitations of one-shot, fixed workflow generation with Workflow-R1, a novel reinforcement learning-based approach for multi-turn workflow construction. Unlike previous methods where the workflow remains static after initial generation, Workflow-R1 optimizes group sub-sequence policies to dynamically construct and adjust the workflow online, conditioning subsequent steps directly on intermediate execution results. Together, these three works trace a clear path from static task-wide optimization to highly dynamic, execution-aware reasoning pathways, offering a comprehensive foundation for building more autonomous and resilient AI agents.


主题:Multi-Agent Language Models: Games, Reasoning, and Alignment

报告人简介:Xiaowu Dai is an assistant professor in the Departments of Statistics and Data Science, and of Biostatistics at UCLA. Before joining UCLA, he did a postdoc at UC Berkeley working with Prof. Mike Jordan, and received a Ph.D. in Statistics at UW-Madison advised by Prof. Grace Wahba. His research focuses on statistical theory and methodology for real-world problems that blend computational, inferential, and economic considerations.

报告摘要:Large Language Models (LLMs) are prone to inconsistencies and hallucinations. We introduce Peer Elicitation Games (PEG), a training-free, game-theoretic framework for aligning LLMs through a peer elicitation mechanism involving a generator and multiple discriminators instantiated from distinct base models. Discriminators interact in a peer evaluation setting, where rewards are computed using a determinant-based mutual information score that provably incentivizes truthful reporting without requiring ground-truth labels. We establish theoretical guarantees showing that each agent, via online learning, achieves sublinear regret in the sense their cumulative performance approaches that of the best fixed truthful strategy in hindsight. Moreover, we prove last-iterate convergence to a truthful Nash equilibrium, ensuring that the actual policies used by agents converge to stable and truthful behavior over time. I'll also discuss the extension of PEG to the multi-agent reasoning and inference-time alignment.


会议注册信息:

为了确保会议顺利开展,本次会议将少量收取注册费,会务组将承担会议期间的用餐,其他费用敬请自理!

注册费用:学生代表 200元,其他代表500元。

报名截止时间:2026年6月30日。

会议信息将通过本公众号及会议网站及时更新,欢迎大家积极关注。会议注册是通过会议网站注册。会议注册网址:

https://ml-stat.github.io/MLSTAT2026/register/


联系方式:

邮箱:mlstat2026@126.com

电话:028-87092330









上一条:第四届机器学习与统计会议系列报告(二)

下一条:第四届机器学习与统计学会议将于2026年7月15日-17日在西南财经大学举办