Causal structure learning from time series: Large regression coefficients may predict causal links better in practice than small p-values

Sebastian Weichwald,Martin E Jakobsen,Phillip B Mogensen,Lasse Petersen,Nikolaj Thams,Gherardo Varando
DOI: https://doi.org/10.48550/arXiv.2002.09573
2020-09-02
Abstract:In this article, we describe the algorithms for causal structure learning from time series data that won the Causality 4 Climate competition at the Conference on Neural Information Processing Systems 2019 (NeurIPS). We examine how our combination of established ideas achieves competitive performance on semi-realistic and realistic time series data exhibiting common challenges in real-world Earth sciences data. In particular, we discuss a) a rationale for leveraging linear methods to identify causal links in non-linear systems, b) a simulation-backed explanation as to why large regression coefficients may predict causal links better in practice than small p-values and thus why normalising the data may sometimes hinder causal structure learning. For benchmark usage, we detail the algorithms here and provide implementations at <a class="link-external link-https" href="https://github.com/sweichwald/tidybench" rel="external noopener nofollow">this https URL</a> . We propose the presented competition-proven methods for baseline benchmark comparisons to guide the development of novel algorithms for structure learning from time series.
Machine Learning,Applications
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is **inferring causal structures from time - series data**, especially in the field of climate science. Specifically, the author explores how to use linear methods to identify causal relationships in nonlinear systems and explains why larger regression coefficients may be more predictive of causal links than small p - values. In addition, the paper also discusses the impact of data normalization on causal structure learning, pointing out that in some cases, it may be more effective to use the regression coefficients in the original data (i.e., unnormalized data) to evaluate the existence of causal links. ### Summary of the core issues in the paper: 1. **Causal structure learning**: How to effectively infer causal relationships between variables from time - series data. 2. **Application of linear methods**: Although the system is nonlinear, linear methods are still effective in identifying causal relationships. 3. **Regression coefficients vs. p - values**: Larger regression coefficients may be more predictive of causal links than small p - values in practical applications. 4. **Impact of data normalization**: In some cases, data normalization may impede the learning of causal structures. ### Specific issues: - **How to infer causal relationships from time - series data**: The paper proposes several algorithms, such as SLARAC, QRBS, LASAR and SELVAR, which identify causal relationships between variables through regression analysis. - **Effectiveness of linear methods**: Even if the system is nonlinear, linear methods can still capture causal relationships between variables through regression coefficients. - **Comparison between regression coefficients and p - values**: Through experimental and theoretical analysis, the paper shows that in some cases, larger regression coefficients can more accurately predict causal links than small p - values. - **Impact of data normalization**: The paper points out that in some cases, data normalization may lead to a decline in the performance of causal structure learning because normalization will lose some useful information. ### Conclusion: Through theoretical analysis and experiments, the paper proves that even in nonlinear systems, linear methods can still effectively identify causal relationships. In addition, larger regression coefficients can be more predictive of causal links than small p - values in some cases. These findings are of great significance for causal structure learning in climate science and other fields.