Multi-granularity Causal Structure Learning

Jiaxuan Liang,Jun Wang,Guoxian Yu,Shuyin Xia,Guoyin Wang
2023-12-09
Abstract:Unveil, model, and comprehend the causal mechanisms underpinning natural phenomena stand as fundamental endeavors across myriad scientific disciplines. Meanwhile, new knowledge emerges when discovering causal relationships from data. Existing causal learning algorithms predominantly focus on the isolated effects of variables, overlook the intricate interplay of multiple variables and their collective behavioral patterns. Furthermore, the ubiquity of high-dimensional data exacts a substantial temporal cost for causal algorithms. In this paper, we develop a novel method called MgCSL (Multi-granularity Causal Structure Learning), which first leverages sparse auto-encoder to explore coarse-graining strategies and causal abstractions from micro-variables to macro-ones. MgCSL then takes multi-granularity variables as inputs to train multilayer perceptrons and to delve the causality between variables. To enhance the efficacy on high-dimensional data, MgCSL introduces a simplified acyclicity constraint to adeptly search the directed acyclic graph among variables. Experimental results show that MgCSL outperforms competitive baselines, and finds out explainable causal connections on fMRI datasets.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that the existing causal learning algorithms mainly focus on the isolated effects between single variables and ignore the complex interactions among multiple variables and their collective behavior patterns. In addition, the prevalence of high - dimensional data has significantly increased the time cost of causal learning algorithms. Therefore, this paper proposes a new method - Multi - Granularity Causal Structure Learning (MgCSL), which aims to explore coarse - grained strategies and causal abstractions from micro - variables to macro - variables, and train multi - granularity variables through Multi - Layer Perceptron (MLP) to deeply explore the causal relationships among variables. To improve the efficiency on high - dimensional data, MgCSL introduces a simplified acyclic constraint to efficiently search for Directed Acyclic Graphs (DAGs) among variables. Experimental results show that MgCSL performs excellently on multiple benchmarks and can discover interpretable causal connections from fMRI datasets. ### Specific Problem Description 1. **Limitations of Existing Causal Learning Algorithms**: - **Ignoring Complex Interactions**: Existing algorithms mainly focus on the isolated effects between single variables and overlook the complex interactions among multiple variables and their collective behavior patterns. - **Difficulty in Processing High - Dimensional Data**: The existence of high - dimensional data has significantly increased the time cost of causal learning algorithms, affecting their practical application effects. 2. **Research Objectives**: - **Multi - Granularity Causal Structure Learning**: Develop a method that can explore coarse - grained strategies and causal abstractions from micro - variables to macro - variables. - **Improve the Efficiency of High - Dimensional Data Processing**: Introduce a simplified acyclic constraint to efficiently search for Directed Acyclic Graphs (DAGs) among variables, thereby improving the processing efficiency on high - dimensional data. ### Solutions 1. **Multi - Granularity Causal Structure Learning (MgCSL)**: - **Sparse Auto - Encoder (SAE)**: Used to automatically coarsen micro - variables into potential macro - variables. - **Multi - Layer Perceptron (MLP)**: Build an MLP for each micro - variable, with inputs including micro - variables and macro - variables, to explore potential causal mechanisms. - **Simplified Acyclic Constraint**: Introduce a simplified acyclic constraint to efficiently search for Directed Acyclic Graphs (DAGs) among variables. 2. **Experimental Verification**: - **Synthetic Datasets**: Use Erdős - Rényi (ER) and Scale - Free (SF) schemes to generate random DAGs and test the performance under different numbers of variables (d ∈ {20, 50, 100}) and edge densities (degree = 2). - **Real Datasets**: Use the Sachs dataset to measure the causal relationships in human cells with different protein and phospholipid expression levels. ### Experimental Results - **Precision**: The precision of MgCSL on multi - granularity synthetic datasets is significantly higher than that of other baseline methods. - **Structural Hamming Distance (SHD)**: The SHD of MgCSL on multi - granularity synthetic datasets is significantly lower than that of other baseline methods. - **Runtime**: The runtime of MgCSL on multi - granularity synthetic datasets is significantly shorter than that of other baseline methods. ### Conclusion MgCSL effectively solves the limitations of existing causal learning algorithms by introducing multi - granularity causal structure learning and simplified acyclic constraints, especially performing well in high - dimensional data processing. Experimental results verify the superior performance of MgCSL in multi - granularity causal structure learning.