Comparative Study of Causal Discovery Methods for Cyclic Models with Hidden Confounders

Boris Lorbeer,Mustafa Mohsen
2024-01-23
Abstract:Nowadays, the need for causal discovery is ubiquitous. A better understanding of not just the stochastic dependencies between parts of a system, but also the actual cause-effect relations, is essential for all parts of science. Thus, the need for reliable methods to detect causal directions is growing constantly. In the last 50 years, many causal discovery algorithms have emerged, but most of them are applicable only under the assumption that the systems have no feedback loops and that they are causally sufficient, i.e. that there are no unmeasured subsystems that can affect multiple measured variables. This is unfortunate since those restrictions can often not be presumed in practice. Feedback is an integral feature of many processes, and real-world systems are rarely completely isolated and fully measured. Fortunately, in recent years, several techniques, that can cope with cyclic, causally insufficient systems, have been developed. And with multiple methods available, a practical application of those algorithms now requires knowledge of the respective strengths and weaknesses. Here, we focus on the problem of causal discovery for sparse linear models which are allowed to have cycles and hidden confounders. We have prepared a comprehensive and thorough comparative study of four causal discovery techniques: two versions of the LLC method [10] and two variants of the ASP-based algorithm [11]. The evaluation investigates the performance of those techniques for various experiments with multiple interventional setups and different dataset sizes.
Machine Learning,Methodology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively discover causal relationships in the presence of cyclic causality and hidden confounding factors. Specifically, the paper focuses on how to compare the performance of different causal discovery methods in sparse linear models when cyclic structures and hidden confounding variables are allowed. These methods include two versions of the LLC method (LLC - NF and LLC - F) and two ASP - based methods (ASP - d and ASP - s). The performance of these methods is evaluated through multiple experimental settings and different dataset sizes. ### Background of the Paper Causal analysis plays an important role in modern science, especially in fields such as medicine, biology, cognitive science, economics, predictive maintenance, root cause analysis, physics, and machine learning. Traditional data analysis mainly studies the probabilistic properties of data to understand the involved probability distributions for predicting new data. Causal analysis, on the other hand, not only learns the data but also the system that generates the data, especially the actual causal mechanisms in the system. ### Research Motivation Most existing causal discovery algorithms assume that the system is acyclic (i.e., without feedback loops) and causally sufficient (i.e., no unmeasured subsystems can affect multiple measured variables). However, in practical applications, these assumptions are often not met. Many real - world systems contain feedback loops and are rarely completely isolated and fully measured. Therefore, in recent years, some causal discovery techniques that can handle cyclic and causally insufficient systems have been developed. ### Research Content The paper focuses on the causal discovery problem in sparse linear models where cyclic and hidden confounding variables are allowed. Specifically, it compares four causal discovery techniques: two versions of the LLC method (LLC - NF and LLC - F) and two ASP - based methods (ASP - d and ASP - s). It evaluates the performance of these techniques under different experimental settings and different dataset sizes. ### Method Overview 1. **LLC Method**: - **LLC - NF**: Does not use the constraint method and does not assume faithfulness. - **LLC - F**: Combines the constraint method and assumes faithfulness. 2. **ASP Method**: - **ASP - d**: Uses d - separation to handle linear and nonlinear acyclic models as well as linear cyclic models. - **ASP - s**: Uses σ - separation to handle linear and nonlinear acyclic models as well as nonlinear cyclic models. ### Evaluation Metrics - **AUC ROC**: Calculate the area under the ROC curve based on the score of each method for features. - **Accuracy**: Define a threshold. If the score of a feature is lower than this threshold, it is considered that the feature does not exist; otherwise, it is considered to exist. Calculate the accuracy of each SCM separately. ### Experimental Design - **Data Generation**: Simulate a graph containing five nodes and two confounding variables, with edges randomly distributed and the in - degree of each node not exceeding two. On average, each graph contains 6.1 edges and two bidirectional edges. - **Coefficient Sampling**: The coefficients of the linear equations are sampled from the uniform distribution \([- 1.1, - 0.1]\cup[0.1, 1.1]\) to ensure that the effect size is far from zero for easy detection. - **Cyclic Structure**: Each simulated SCM contains at least one cycle (not a self - loop). - **Dataset**: Generate data based on 150 randomly generated SCMs. ### Conclusion Through the above methods and evaluation metrics, the paper aims to provide a comprehensive comparative study to help researchers and practitioners understand the advantages and disadvantages of different causal discovery methods in the presence of cyclic and hidden confounding factors. This is helpful for selecting the appropriate algorithm to apply to practical problems.