Abstract:Discovering latent representations of the observed world has become increasingly more relevant in data analysis. Much of the effort concentrates on building latent variables which can be used in prediction problems, such as classification and regression. A related goal of learning latent structure from data is that of identifying which hidden common causes generate the observations, such as in applications that require predicting the effect of policies. This will be the main problem tackled in our contribution: given a dataset of indicators assumed to be generated by unknown and unmeasured common causes, we wish to discover which hidden common causes are those, and how they generate our data. This is possible under the assumption that observed variables are linear functions of the latent causes with additive noise. Previous results in the literature present solutions for the case where each observed variable is a noisy function of a single latent variable. We show how to extend the existing results for some cases where observed variables measure more than one latent variable.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is: **How to discover latent common causes from data and how to generate a measurement model of the observed data**. Specifically, the author focuses on how to identify the latent variables hidden behind the observed variables, and determine the causal relationships between these latent variables and their relationships with the observed variables. ### Problem Background In many scientific research fields, especially in social sciences and psychology, researchers often need to deal with latent variables that cannot be directly observed, such as the "industrialization level" and "democratization level" of a country. These latent variables are reflected through a series of observable indicators (such as GNP, energy consumption, etc.). However, traditional factor analysis methods have limitations when dealing with such problems, especially when there are complex causal relationships between latent variables. ### Paper Objectives The objective of the paper is to develop a new method that can discover latent common causes from a given data set and construct the causal structure between these latent variables and the observed variables. Specifically, the author hopes to solve the following problems: 1. **Identify Latent Variables**: Find out which latent variables exist from the observed data. 2. **Determine Causal Relationships**: Determine the causal relationships between these latent variables and how they affect the observed variables. 3. **Construct a Measurement Model**: Construct a measurement model to describe how latent variables generate observed data. ### Solutions To solve the above problems, the author proposes a new algorithm, which is based on the following assumptions and steps: - **Assumptions**: - The relationship between the observed variables and the latent variables is linear, and there is additive noise. - There are no direct causal relationships between the observed variables (i.e., there are no ancestral relationships between the observed variables). - Each pair of observed variables has a common latent ancestor. - **Steps**: - Use tetrad constraints to identify the existence of latent variables. - Use specific conditional independence tests (such as T(ABCD)) to distinguish different latent variables. - Represent the implicit paths between latent variables by introducing bi - directed edges. - Deal with impure measurement models, that is, some observed variables may be jointly affected by multiple latent variables. ### Application Examples The paper gives a specific example to illustrate the effectiveness of the new method. For example, in the synthetic structure shown in Figure 2(a), the author shows how to reconstruct the correct causal graph (Figure 2(b)) from the observed data. In contrast, traditional factor analysis methods (Figure 2(c) and Figure 2(d)) cannot provide a reasonable explanation. ### Conclusions In general, this paper proposes a new method that can effectively identify latent variables and construct measurement models under more complex causal structures. This method can not only handle simple pure measurement sub - models, but also deal with cases containing impure indicators. This provides a powerful tool for understanding the latent causal relationships in complex systems.

Measuring Latent Causal Structure

Identification of Linear Non-Gaussian Latent Hierarchical Structure.

Learning Measurement Models for Unobserved Variables

Learning Discrete Latent Variable Structures with Tensor Rank Conditions

Identification of Causal Structure with Latent Variables Based on Higher Order Cumulants

Local Causal Structure Learning in the Presence of Latent Variables

Causal Discovery under Latent Class Confounding

Generalized Independent Noise Condition for Estimating Causal Structure with Latent Variables

Causal Discovery of 1-Factor Measurement Models in Linear Latent Variable Models with Arbitrary Noise Distributions.

Differentiable Causal Discovery For Latent Hierarchical Causal Models

Identifying Weight-Variant Latent Causal Models

Identifiable Latent Neural Causal Models

Causal Inference with Latent Variables: Recent Advances and Future Prospectives

Causal Discovery in Linear Models with Unobserved Variables and Measurement Error

Actively Identifying Causal Effects with Latent Variables Given Only Response Variable Observable.

Everything that can be learned about a causal structure with latent variables by observational and interventional probing schemes

A Versatile Causal Discovery Framework to Allow Causally-Related Hidden Variables

Learning causal structures using hidden compact representation

Learning Latent Causal Dynamics

Discovering Latent Structural Causal Models from Spatio-Temporal Data

Nonlinearity, Feedback and Uniform Consistency in Causal Structural Learning