Abstract:We investigate the task of learning causal structure in the presence of latent variables, including locating latent variables and determining their quantity, and identifying causal relationships among both latent and observed variables. To this end, we propose a Generalized Independent Noise (GIN) condition for linear non-Gaussian acyclic causal models that incorporate latent variables, which establishes the independence between a linear combination of certain measured variables and some other measured variables. Specifically, for two observed random vectors $\bf{Y}$ and $\bf{Z}$, GIN holds if and only if $\omega^{\intercal}\mathbf{Y}$ and $\mathbf{Z}$ are independent, where $\omega$ is a non-zero parameter vector determined by the cross-covariance between $\mathbf{Y}$ and $\mathbf{Z}$. We then give necessary and sufficient graphical criteria of the GIN condition in linear non-Gaussian acyclic models. Roughly speaking, GIN implies the existence of a set $\mathcal{S}$ such that $\mathcal{S}$ is causally earlier (w.r.t. the causal ordering) than $\mathbf{Y}$, and that every active (collider-free) path between $\mathbf{Y}$ and $\mathbf{Z}$ must contain a node from $\mathcal{S}$. Interestingly, we find that the independent noise condition (i.e., if there is no confounder, causes are independent of the residual derived from regressing the effect on the causes) can be seen as a special case of GIN. With such a connection between GIN and latent causal structures, we further leverage the proposed GIN condition, together with a well-designed search procedure, to efficiently estimate Linear, Non-Gaussian Latent Hierarchical Models (LiNGLaHs), where latent confounders may also be causally related and may even follow a hierarchical structure. We show that the causal structure of a LiNGLaH is identifiable in light of GIN conditions. Experimental results show the effectiveness of the proposed method.

What problem does this paper attempt to address?

The paper primarily aims to address the problem of estimating causal structures in the presence of latent variables, including locating latent variables, determining their number, and identifying the causal relationships between latent and observed variables. To achieve this goal, the authors propose a method called the "Generalized Independent Noise (GIN)" condition. This condition applies to linear non-Gaussian acyclic causal models that include latent variables and defines a specific type of statistical independence between two observed random vectors $Y$ and $Z$. Specifically, if there exists a non-zero parameter vector $\omega$ such that $\omega^\top Y$ is statistically independent of $Z$, then $Y$ and $Z$ are said to satisfy the GIN condition. Here, $\omega$ is determined by the cross-covariance of $Y$ and $Z$. The paper further provides necessary and sufficient graphical conditions for the GIN condition, which help in understanding how GIN relates to causal graphs. By combining the GIN condition with a carefully designed search process, it is possible to effectively estimate Linear, Non-Gaussian Latent Hierarchical Models (LiNGLaHs), where latent confounders may also have causal relationships and may follow a hierarchical structure. Moreover, the method can handle more complex latent graph structures, as shown in Figure 1, which includes cases with multiple layers of latent variables. Experimental results demonstrate that the proposed method is effective on synthetic data and three real-world datasets. In summary, the main contributions of this paper are: 1. Defining a Generalized Independent Noise condition and showing how it encompasses previous independent noise conditions as a special case. 2. Providing necessary and sufficient graphical conditions for the GIN condition. 3. Using the GIN condition to estimate causal structures involving latent variables and proving that, under mild assumptions, the underlying causal structure of LiNGLaH can be identified. 4. Addressing some challenges in practical applications and providing more reliable and statistically efficient methods for estimating the LiNGLaH structure with finite samples. 5. Validating the effectiveness of the algorithm on synthetic data and real-world datasets.

Generalized Independent Noise Condition for Estimating Causal Structure with Latent Variables

Identification of Linear Non-Gaussian Latent Hierarchical Structure.

Causal Discovery of Linear Non-Gaussian Causal Models with Unobserved Confounding

Identification of Latent Variables From Graphical Model Residuals

Causal Discovery in Linear Non-Gaussian Acyclic Model With Multiple Latent Confounders

Functional Linear Non-Gaussian Acyclic Model for Causal Discovery

Identifiable Latent Neural Causal Models

Distinguish Markov Equivalence Classes From Large-Scale Linear Non-Gaussian Data

A Linear Non-Gaussian Acyclic Model for Causal Discovery

Recursively Learning Causal Structures Using Regression-based Conditional Independence Test

Causal Discovery under Latent Class Confounding

Combining Linear Non-Gaussian Acyclic Model with Logistic Regression Model for Estimating Causal Structure from Mixed Continuous and Discrete Data

Measuring Latent Causal Structure

Learning Latent Causal Structures with a Redundant Input Neural Network

Causal GNNs: A GNN-Driven Instrumental Variable Approach for Causal Inference in Networks

Identification of Causal Structure with Latent Variables Based on Higher Order Cumulants

Identifying Weight-Variant Latent Causal Models

Identification of Nonlinear Latent Hierarchical Models

Latent Causal Invariant Model

Conditionally-additive-noise Models for Structure Learning

Learning Causal Structures Based on Divide and Conquer