Comprehensive Review and Empirical Evaluation of Causal Discovery Algorithms for Numerical Data

Wenjin Niu,Zijun Gao,Liyan Song,Lingbo Li
2024-09-04
Abstract:Causal analysis has become an essential component in understanding the underlying causes of phenomena across various fields. Despite its significance, existing literature on causal discovery algorithms is fragmented, with inconsistent methodologies, i.e., there is no universal classification standard for existing methods, and a lack of comprehensive evaluations, i.e., data characteristics are often ignored to be jointly analyzed when benchmarking algorithms. This study addresses these gaps by conducting an exhaustive review and empirical evaluation for causal discovery methods on numerical data, aiming to provide a clearer and more structured understanding of the field. Our research begins with a comprehensive literature review spanning over two decades, analyzing over 200 academic articles and identifying more than 40 representative algorithms. This extensive analysis leads to the development of a structured taxonomy tailored to the complexities of causal discovery, categorizing methods into six main types. To address the lack of comprehensive evaluations, our study conducts an extensive empirical assessment of 29 causal discovery algorithms on multiple synthetic and real-world datasets. We categorize synthetic datasets based on size, linearity, and noise distribution, employing five evaluation metrics, and summarize the top-3 algorithm recommendations, providing guidelines for users in various data scenarios. Our results highlight a significant impact of dataset characteristics on algorithm performance. Moreover, a metadata extraction strategy with an accuracy exceeding 80% is developed to assist users in algorithm selection on unknown datasets. Based on these insights, we offer professional and practical guidelines to help users choose the most suitable causal discovery methods for their specific dataset.
Artificial Intelligence
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the fragmentation of the existing causal discovery algorithm literature and the lack of systematic evaluation. Specifically: 1. **Lack of a unified classification standard**: Existing causal discovery methods do not have a common classification standard, making it difficult for researchers to fully understand the characteristics and application scope of these methods. 2. **Lack of comprehensive evaluation**: When evaluating causal discovery algorithms, data features are often ignored, resulting in an incomplete and inaccurate comparison of the performance of different algorithms. To fill these gaps, the paper has carried out the following work: - **Comprehensive review**: A comprehensive review of more than 200 academic articles in the past two decades was conducted, more than 40 representative algorithms were identified, and a structured taxonomy was proposed, which divides the methods into six main types. - **Empirical evaluation**: An extensive empirical evaluation of 29 causal discovery algorithms was carried out, using multiple synthetic data sets and real - data sets for testing. The evaluation indicators include five aspects, and the top three recommended algorithms in each data scenario are summarized. - **Metadata extraction strategy**: A metadata extraction strategy with an accuracy rate of over 80% was developed to help users select appropriate algorithms on unknown data sets. - **Practical guide**: Based on the above research results, professional and practical guides are provided to help users select the most appropriate causal discovery method according to specific data sets. Through these efforts, the paper aims to provide a clearer and more structured understanding framework to help researchers and practitioners better select and apply causal discovery algorithms.