Functional Linear Non-Gaussian Acyclic Model for Causal Discovery

Tian-Le Yang,Kuang-Yao Lee,Kun Zhang,Joe Suzuki
2024-01-18
Abstract:In causal discovery, non-Gaussianity has been used to characterize the complete configuration of a Linear Non-Gaussian Acyclic Model (LiNGAM), encompassing both the causal ordering of variables and their respective connection strengths. However, LiNGAM can only deal with the finite-dimensional case. To expand this concept, we extend the notion of variables to encompass vectors and even functions, leading to the Functional Linear Non-Gaussian Acyclic Model (Func-LiNGAM). Our motivation stems from the desire to identify causal relationships in brain-effective connectivity tasks involving, for example, fMRI and EEG datasets. We demonstrate why the original LiNGAM fails to handle these inherently infinite-dimensional datasets and explain the availability of functional data analysis from both empirical and theoretical perspectives. {We establish theoretical guarantees of the identifiability of the causal relationship among non-Gaussian random vectors and even random functions in infinite-dimensional Hilbert spaces.} To address the issue of sparsity in discrete time points within intrinsic infinite-dimensional functional data, we propose optimizing the coordinates of the vectors using functional principal component analysis. Experimental results on synthetic data verify the ability of the proposed framework to identify causal relationships among multivariate functions using the observed samples. For real data, we focus on analyzing the brain connectivity patterns derived from fMRI data.
Machine Learning,Statistics Theory,Neurons and Cognition,Methodology
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of dealing with infinite - dimensional data in causal discovery. Specifically, the authors propose a Functional Linear Non - Gaussian Acyclic Model (Func - LiNGAM) to extend the traditional Linear Non - Gaussian Acyclic Model (LiNGAM). The traditional LiNGAM can only handle finite - dimensional data, while Func - LiNGAM can handle variables in vector and function forms, especially being suitable for fMRI and EEG datasets in brain effective connectivity tasks. #### Main problems: 1. **Extension from finite - dimensional to infinite - dimensional**: Existing LiNGAM methods can only handle finite - dimensional data and cannot be directly applied to infinite - dimensional data (such as functional data). This limits its wide use in practical applications, especially when dealing with continuous time - series data. 2. **Non - Gaussianity and causal relationship identification**: In causal discovery, non - Gaussianity is used to characterize the complete configuration of the Linear Non - Gaussian Acyclic Model (LiNGAM), including the causal order of variables and their connection strengths. However, for infinite - dimensional data, how to use non - Gaussianity to identify causal relationships is a challenge. 3. **Sparsity and processing of discrete time points**: In actual data, especially in brain imaging data, data at discrete time points are often sparse. How to optimize the coordinate representation of these sparse data for better identification of causal relationships is another problem that needs to be solved. #### Solutions: - **Propose Func - LiNGAM model**: By extending the concept of variables to vectors and functions, a new framework is constructed that can handle the causal structure of multivariate function data. - **Theoretical guarantee**: The theoretical guarantee of the identifiability of causal relationships between random vectors and random functions in infinite - dimensional Hilbert space is established, proving that under the non - Gaussian assumption, the causal relationship can be uniquely determined. - **Optimization method**: In order to solve the sparsity problem of discrete time points, a scheme of using Functional Principal Component Analysis (FPCA) to optimize vector coordinates is proposed. - **Experimental verification**: Through experiments on synthetic data and real data (such as fMRI data), the effectiveness of the proposed method is verified, showing its potential in identifying causal relationships between multivariate functions. In short, the main goal of this paper is to develop a new causal discovery model that can effectively identify causal relationships in infinite - dimensional data (such as brain imaging data), thereby providing a more powerful tool for neuroscience, medicine and other fields.