Abstract:We are interested in the problem of learning the directed acyclic graph (DAG) when data are generated from a linear structural equation model (SEM) and the causal structure can be characterized by a polytree. Under the Gaussian polytree models, we study sufficient conditions on the sample sizes for the well-known Chow-Liu algorithm to exactly recover both the skeleton and the equivalence class of the polytree, which is uniquely represented by a CPDAG. On the other hand, necessary conditions on the required sample sizes for both skeleton and CPDAG recovery are also derived in terms of information-theoretic lower bounds, which match the respective sufficient conditions and thereby give a sharp characterization of the difficulty of these tasks. We also consider the problem of inverse correlation matrix estimation under the linear polytree models, and establish the estimation error bound in terms of the dimension and the total number of v-structures. We also consider an extension of group linear polytree models, in which each node represents a group of variables. Our theoretical findings are illustrated by comprehensive numerical simulations, and experiments on benchmark data also demonstrate the robustness of polytree learning when the true graphical structures can only be approximated by polytrees.

What problem does this paper attempt to address?

The paper primarily focuses on studying how to learn the structure of a Directed Acyclic Graph (DAG) from data in a specific type of Structural Equation Model (Linear Polytree Structural Equation Models). Specifically, the issues addressed in the paper can be summarized as follows: 1. **Skeleton Recovery**: - The paper investigates how to learn the skeleton of a DAG when the data is generated by a linear structural equation model and the causal structure can be described by a polytree. - Under the Gaussian polytree model, the paper studies the sufficient conditions on the sample size required for the famous Chow-Liu algorithm to accurately recover the polytree skeleton. - Additionally, it derives the information-theoretic lower bounds on the sample size necessary for skeleton recovery, which match the aforementioned sufficient conditions, thereby providing a precise characterization of the difficulty of skeleton recovery. 2. **Equivalence Class / CPDAG Recovery**: - The paper also considers how to recover the equivalence class of a polytree from data, represented by the completed partially directed graph (CPDAG) uniquely representing the polytree. - It examines the sufficient conditions on the sample size for recovering the CPDAG and similarly provides the necessary information-theoretic lower bounds, which match the sufficient conditions, thus accurately describing the difficulty of CPDAG recovery. 3. **Inverse Covariance Matrix Estimation**: - The paper further considers the problem of inverse covariance matrix estimation under the linear polytree model and establishes bounds on the estimation error. In summary, the core objective of the paper is to understand whether the Chow-Liu algorithm and other related methods can effectively recover the skeleton and equivalence class (CPDAG) of a polytree model from data given certain sample size conditions, and it provides theoretical guarantees for this. Additionally, the paper explores the performance of these methods on real data and validates the theoretical results through numerical simulations.

Learning Linear Polytree Structural Equation Models

Linear Polytree Structural Equation Models: Structural Learning and Inverse Correlation Estimation

A Two-Step Estimation Method for Grouped Data with Connections to the Extended Growth Curve Model and Partial Least Squares Regression.

Learning Linear Gaussian Polytree Models with Interventions

A Novel Causal Discovery Method in Linear SEM with Structure Priors

Optimal estimation of Gaussian (poly)trees

Direct Learning with Guarantees of the Difference DAG Between Structural Equation Models

Guaranteed Scalable Learning of Latent Tree Models

Learning Undirected Graphical Models with Structure Penalty

High-Dimensional Poisson DAG Model Learning Using $\ell_1$-Regularized Regression

Structural Equation Models as Computation Graphs

Learning Large Causal Structures from Inverse Covariance Matrix via Sparse Matrix Decomposition

Partial Homoscedasticity in Causal Discovery with Linear Models

An efficient causal structure learning algorithm based on recursive simultaneous equations model

Structure Learning for Cyclic Linear Causal Models

Structural Discovery with Partial Ordering Information for Time-Dependent Data with Convergence Guarantees

Efficient Learning of Quadratic Variance Function Directed Acyclic Graphs via Topological Layers

Skeleton Estimation of Directed Acyclic Graphs Using Partial Least Squares from Correlated Data.

Effective Causal Discovery under Identifiable Heteroscedastic Noise Model

Fitting Structural Equation Model Trees and Latent Growth Curve Mixture Models in Longitudinal Designs: The Influence of Model Misspecification

Learning latent tree models with small query complexity