跳转至

OCIS · Fall 2025

  • 共 9 场 · 9 篇精读

本季导览

自动生成:归纳本季主线与值得先看的几场,不打分、不排名

这一季的 9 场报告大致可归纳为四条主线:未观测混杂下的识别与敏感性分析(Cinelli, Antonelli, Miles, Wang)、半参数模型诊断与可解释性(Shah, Gao)、因果发现与软件工具(Runge)、以及迁移学习与因果表示学习(Ha, Young researchers)。其中,未观测混杂是贯穿多场的核心议题,但各场从不同角度切入——Cinelli 和 Antonelli 分别从敏感性分析和部分识别出发,Miles 利用可分离效应框架处理极端正定性违反,Wang 则在高维稀疏场景下引入合成工具变量。

最突出的主线是未观测混杂的应对策略,共四场。Cinelli 将经典 OVB 公式推广到非参数因果参数(ATE、平均因果导数等),为 DML 等灵活估计器提供统一的偏倚界。Antonelli 利用多处理-多结果结构,在因子混杂假设下推导部分识别区间,与 Cinelli 的敏感性分析形成互补(前者给出区间,后者给出偏倚方向)。Miles 通过可分离效应框架,在手术与麻醉完全共线(极端正定性违反)时分离二者的因果效应,展示了结构因果模型在极端场景下的应用。Wang 的合成工具变量则在高维暴露中利用稀疏因果假设,从关联性中提取因果信号,与 Antonelli 的多变量结构思路形成对照。

另一条主线是模型诊断与可解释性,共两场。Shah 提出“狩猎与检验”框架,用于半参数回归模型(如 GAM、部分线性模型)的拟合优度检验,核心创新在于数据分割后先搜索信号再构造检验,避免带宽选择难题。Gao 的因果 ANOVA 将经典功能性 ANOVA 从关联性扩展为因果性,通过反事实对比和因果顺序依赖处理,为复杂模型提供因果层面的变量重要性分解。这两场均涉及“如何判断模型是否足够好”这一基础问题,但 Shah 聚焦于模型拒绝的统计检验,Gao 聚焦于解释力的因果归因。

其余三场分别覆盖因果发现工具(Runge 的 Tigramite 包,专攻时间序列的 PCMCI+ 算法与最优调整集)、半监督域适应理论(Ha 用结构因果模型显式建模分布转移,分析微调策略的有效性)、以及因果表示学习(Young researchers 在潜变量 DAG 框架下,从大规模扰动数据中学习细胞因果程序)。这三场与前述主线关联较弱,但 Ha 的因果视角域适应与 Runge 的时间序列因果发现均涉及分布漂移下的因果结构学习,可视为因果推断在复杂数据场景下的延伸。

若想快速把握这一季的核心贡献,建议按以下路径观看:未观测混杂入口——先看 Cinelli(OVB 统一框架打底),再看 Antonelli(多变量部分识别)和 Miles(极端正定性处理),最后 Wang(高维稀疏场景)作为进阶。模型诊断入口——Shah(半参数 GoF 检验)和 Gao(因果 ANOVA)可并行观看,前者偏检验理论,后者偏可解释性。因果发现与软件入口——Runge 的 Tigramite 演示适合想直接上手时间序列因果分析的听众。迁移与表示学习入口——Ha(SSDA 理论)和 Young researchers(因果表示学习)适合对分布漂移或潜变量建模感兴趣的听众。

报告列表

Long Story Short: Omitted Variable Bias in Causal Machine Learning

讲者: Carlos Cinelli · 讨论人: Dominik Rothenhäusler · 2025-12-09
链接:视频 · 幻灯片

摘要 We develop a general theory of omitted variable bias for a wide range of common causal parameters, including (but not limited to) averages of potential outcomes, average treatment effects, average causal derivatives, and policy effects from covariate shifts. Our theory applies to nonparametric models, while naturally allowing for (semi-)parametric restrictions (such as partial linearity) when such assumptions are made. We show how simple plausibility judgments on the maximum explanatory power of…

Hunt and test for assessing the fit of semiparametric regression models

讲者: Rajen Shah · 讨论人: Mats Stensrud · 2025-12-02
链接:视频 · 幻灯片

摘要 We consider testing the goodness of fit of semiparametric regression models, such as generalised additive models, partially linear models, and quantile additive regression models: a class of problems that includes, for example, testing for heterogeneous treatment effects. We propose an approach that involves splitting the data in two parts. On one part, we "hunt" for any signal that may be present in the score-type residuals following a fit of the null model. On the remaining data, we test for t…

Addressing an extreme positivity violation to distinguish the causal effects of surgery and anesthesia via separable effects

讲者: Caleb Miles · 讨论人: James Robins and Thomas Richardson · 2025-11-18
链接:视频 · 幻灯片 · arXiv

摘要 The U.S. Food and Drug Administration has cautioned that prenatal exposure to anesthetic drugs during the third trimester may have neurotoxic effects; however, there is limited clinical evidence available to substantiate this recommendation. One major scientific question of interest is whether such neurotoxic effects might be due to surgery, anesthesia, or both. Isolating the effects of these two exposures is challenging because they are observationally equivalent, thereby inducing an extreme po…

Explainability and Analysis of Variance

讲者: Zijun Gao · 讨论人: Art Owen · 2025-11-04
链接:视频 · 幻灯片 · arXiv

摘要 Existing tools for explaining complex models and systems are associational rather than causal and do not provide mechanistic understanding. We propose a new notion called counterfactual explainability for causal attribution that is motivated by the concept of genetic heritability in twin studies. Counterfactual explainability extends methods for global sensitivity analysis (including the functional analysis of variance and Sobol's indices), which assumes independent explanatory variables, to dep…

The synthetic instrument: From sparse association to sparse causation

讲者: Linbo Wang · 讨论人: Zijian Guo · 2025-10-28
链接:视频 · 幻灯片 · arXiv

摘要 In many observational studies, researchers are often interested in studying the effects of multiple exposures on a single outcome. Standard approaches for high-dimensional data such as the lasso assume the associations between the exposures and the outcome are sparse. These methods, however, do not estimate the causal effects in the presence of unmeasured confounding. In this paper, we consider an alternative approach that assumes the causal effects in view are sparse. We show that with sparse c…

When few labeled target data suffice: a theory of semi-supervised domain adaptation via fine-tuning from multiple adaptive starts

讲者: Wooseok Ha · 讨论人: Jason Kluswoski · 2025-10-21
链接:视频 · 幻灯片 · arXiv

摘要 Semi-supervised domain adaptation (SSDA) aims to achieve high predictive performance in the target domain with limited labeled target data by exploiting abundant source and unlabeled target data. Despite its significance in numerous applications, theory on the effectiveness of SSDA remains largely unexplored, particularly in scenarios involving various types of source-target distributional shifts. In this talk, I will present a theoretical framework based on structural causal models (SCMs) which…

Causal Inference on Time Series Data with the Tigramite Package

讲者: Jakob Runge · 2025-10-14
链接:视频 · 幻灯片

摘要 This talk introduces the open-source Python package Tigramite , which implements constraint-based algorithms such as PCMCI+ and many variants thereof as methods optimised for causal discovery on time series. In addition, Tigramite features causal effect estimation using optimal adjustment. I will outline the basic ideas behind PCMCI and optimal adjustment and then demonstrate practical workflows in Tigramite, including a user-friendly guide to choosing methods in causal inference based on causal…

Partial identification and unmeasured confounding with multiple treatments and multiple outcomes

讲者: Joseph Antonelli · 2025-09-30
链接:视频 · 幻灯片 · arXiv

摘要 Estimating the health effects of multiple air pollutants is a crucial problem in public health, but one that is difficult due to unmeasured confounding bias. Motivated by this issue, we develop a framework for partial identification of causal effects in the presence of unmeasured confounding in settings with multiple treatments and multiple outcomes. Under a factor confounding assumption, we show that joint partial identification regions for multiple estimands can be more informative than consid…

Learning causal cellular programs from large-scale perturbations

讲者: Young researchers' seminar · 2025-09-23
链接:视频 · 幻灯片 · arXiv

摘要 Complex molecular mechanisms govern cellular functions in living organisms and shape their behavior in health and disease. Understanding these mechanisms can greatly accelerate therapeutic discovery, yet it remains challenging due to the high dimensionality and intricate dependencies within biological systems. Recent advances in experimental technologies, however, are beginning to make this problem more tractable. For example, we can now systematically perturb individual or combinations of genes…

Maintained by 陈星宇 · Homepage · Source

评论