Measuring Latent Causal Structure

Ricardo Silva
DOI: https://doi.org/10.48550/arXiv.1001.1079
2010-01-07
Abstract:Discovering latent representations of the observed world has become increasingly more relevant in data analysis. Much of the effort concentrates on building latent variables which can be used in prediction problems, such as classification and regression. A related goal of learning latent structure from data is that of identifying which hidden common causes generate the observations, such as in applications that require predicting the effect of policies. This will be the main problem tackled in our contribution: given a dataset of indicators assumed to be generated by unknown and unmeasured common causes, we wish to discover which hidden common causes are those, and how they generate our data. This is possible under the assumption that observed variables are linear functions of the latent causes with additive noise. Previous results in the literature present solutions for the case where each observed variable is a noisy function of a single latent variable. We show how to extend the existing results for some cases where observed variables measure more than one latent variable.
Machine Learning
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: **How to discover latent common causes from data and how to generate a measurement model of the observed data**. Specifically, the author focuses on how to identify the latent variables hidden behind the observed variables, and determine the causal relationships between these latent variables and their relationships with the observed variables. ### Problem Background In many scientific research fields, especially in social sciences and psychology, researchers often need to deal with latent variables that cannot be directly observed, such as the "industrialization level" and "democratization level" of a country. These latent variables are reflected through a series of observable indicators (such as GNP, energy consumption, etc.). However, traditional factor analysis methods have limitations when dealing with such problems, especially when there are complex causal relationships between latent variables. ### Paper Objectives The objective of the paper is to develop a new method that can discover latent common causes from a given data set and construct the causal structure between these latent variables and the observed variables. Specifically, the author hopes to solve the following problems: 1. **Identify Latent Variables**: Find out which latent variables exist from the observed data. 2. **Determine Causal Relationships**: Determine the causal relationships between these latent variables and how they affect the observed variables. 3. **Construct a Measurement Model**: Construct a measurement model to describe how latent variables generate observed data. ### Solutions To solve the above problems, the author proposes a new algorithm, which is based on the following assumptions and steps: - **Assumptions**: - The relationship between the observed variables and the latent variables is linear, and there is additive noise. - There are no direct causal relationships between the observed variables (i.e., there are no ancestral relationships between the observed variables). - Each pair of observed variables has a common latent ancestor. - **Steps**: - Use tetrad constraints to identify the existence of latent variables. - Use specific conditional independence tests (such as T(ABCD)) to distinguish different latent variables. - Represent the implicit paths between latent variables by introducing bi - directed edges. - Deal with impure measurement models, that is, some observed variables may be jointly affected by multiple latent variables. ### Application Examples The paper gives a specific example to illustrate the effectiveness of the new method. For example, in the synthetic structure shown in Figure 2(a), the author shows how to reconstruct the correct causal graph (Figure 2(b)) from the observed data. In contrast, traditional factor analysis methods (Figure 2(c) and Figure 2(d)) cannot provide a reasonable explanation. ### Conclusions In general, this paper proposes a new method that can effectively identify latent variables and construct measurement models under more complex causal structures. This method can not only handle simple pure measurement sub - models, but also deal with cases containing impure indicators. This provides a powerful tool for understanding the latent causal relationships in complex systems.