Abstract:Causality lays the foundation for the trajectory of our world. Causal inference (CI), which aims to infer intrinsic causal relations among variables of interest, has emerged as a crucial research topic. Nevertheless, the lack of observation of important variables (e.g., confounders, mediators, exogenous variables, etc.) severely compromises the reliability of CI methods. The issue may arise from the inherent difficulty in measuring the variables. Additionally, in observational studies where variables are passively recorded, certain covariates might be inadvertently omitted by the experimenter. Depending on the type of unobserved variables and the specific CI task, various consequences can be incurred if these latent variables are carelessly handled, such as biased estimation of causal effects, incomplete understanding of causal mechanisms, lack of individual-level causal consideration, etc. In this survey, we provide a comprehensive review of recent developments in CI with latent variables. We start by discussing traditional CI techniques when variables of interest are assumed to be fully observed. Afterward, under the taxonomy of circumvention and inference-based methods, we provide an in-depth discussion of various CI strategies to handle latent variables, covering the tasks of causal effect estimation, mediation analysis, counterfactual reasoning, and causal discovery. Furthermore, we generalize the discussion to graph data where interference among units may exist. Finally, we offer fresh aspects for further advancement of CI with latent variables, especially new opportunities in the era of large language models (LLMs).

What problem does this paper attempt to address?

This paper attempts to solve the problems in causal inference (CI) caused by the failure to observe important variables (such as confounding factors, mediating variables, exogenous variables, etc.). Specifically, the paper focuses on how to conduct reliable causal inference in the presence of latent variables. These problems mainly include: 1. **Bias in causal effect estimation**: Unobserved confounding factors may lead to bias in the estimation of causal effects. For example, if the disease severity is not considered, it may be wrongly concluded that an effective drug reduces the recovery rate, because more severely ill patients are more likely to receive this drug treatment. 2. **Incomplete understanding of causal mechanisms**: The lack of important mediating variables may lead to an incomplete understanding of causal mechanisms. For example, the debate on the causal relationship between smoking and lung cancer was not resolved until it was determined that tar deposits were the cause of lung cancer. 3. **Difficulty in estimating individual - level causal effects**: Exogenous variables are usually regarded as noise and are not explicitly included in the observed data. However, without these variables, the individual differences in treatment effects cannot be estimated, which hinders personalized counterfactual analysis. 4. **Impossibility of causal discovery**: If the variables of interest are not fully known, causal discovery will become impossible. To address these problems, the paper reviews the latest progress in causal inference dealing with latent variables in recent years and proposes two main categories of methods: 1. **Circumvention - based Methods**: These methods avoid directly modeling latent variables. Instead, through certain strict assumptions or conditions, the causal effect can still be unbiasedly estimated even without directly measuring the latent variables or their proxy variables. For example, using a small amount of random data to correct the bias in large - scale observational data, or using instrumental variables (IV) to "extract" pseudo - random data from observational data. 2. **Inference - based Methods**: These methods assume that even if the confounding factor \(C\) cannot be directly observed, its proxy variable \(W\) can be observed. These proxy variables are helpful for inferring latent confounding factors to solve the confounding bias. For example, using matrix factorization (MF) to obtain low - rank components, or using causal effect variational auto - encoder (CEVAE) to infer the latent confounding factor \(C\) from the observed data. The paper also discusses the applications of these methods in dealing with various causal tasks (such as causal effect estimation, causal mediation analysis, counterfactual reasoning, and causal discovery) and graph data, and looks forward to future development directions, especially the new opportunities brought by large - language models (LLMs).

Causal Inference with Latent Variables: Recent Advances and Future Prospectives

A Survey on Causal Inference

A Versatile Causal Discovery Framework to Allow Causally-Related Hidden Variables

Estimating Possible Causal Effects with Latent Variables Via Adjustment.

Causal discovery and inference: concepts and recent methodological advances

Dynamical causality under invisible confounders

Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

Recent Developments in Causal Inference and Machine Learning

Causal Inference and Related Statistical Methods

From Query Tools to Causal Architects: Harnessing Large Language Models for Advanced Causal Discovery from Data

Causality for Large Language Models

Methods and tools for causal discovery and causal inference

Causal Inference Meets Deep Learning: A Comprehensive Survey

Causal Mediation Analysis with Hidden Confounders

Causal Inference with Complex Treatments: A Survey

Local Learning for Covariate Selection in Nonparametric Causal Effect Estimation with Latent Variables

Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey

Statistical Approaches for Causal Inference

Differentiable Causal Discovery For Latent Hierarchical Causal Models

Large Language Models for Constrained-Based Causal Discovery

Improving Causal Reasoning in Large Language Models: A Survey