The Causal Loss: Driving Correlation to Imply Causation

Moritz Willig,Matej Zečević,Devendra Singh Dhami,Kristian Kersting
DOI: https://doi.org/10.48550/arXiv.2110.12066
2021-10-23
Abstract:Most algorithms in classical and contemporary machine learning focus on correlation-based dependence between features to drive performance. Although success has been observed in many relevant problems, these algorithms fail when the underlying causality is inconsistent with the assumed relations. We propose a novel model-agnostic loss function called Causal Loss that improves the interventional quality of the prediction using an intervened neural-causal regularizer. In support of our theoretical results, our experimental illustration shows how causal loss bestows a non-causal associative model (like a standard neural net or decision tree) with interventional capabilities.
Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem that correlation - based models in machine learning perform poorly when the causation is inconsistent. Specifically: 1. **Existing problems**: - The vast majority of classical and modern machine - learning algorithms mainly rely on the correlation between features to drive performance improvement. - These algorithms perform well in handling many tasks, such as computer vision and natural language processing. - However, when the underlying causal relationships are inconsistent with the assumed relationships, these correlation - based models often perform poorly. For example, deep neural networks rely on the cross - correlation between features in image classification tasks, but perform poorly in extracting causal information. 2. **Proposed new method**: - The authors propose a new, model - independent loss function - **Causal Loss**, which improves the intervention quality of prediction by introducing an intervened neural - causal regularizer. - By using the causal loss, non - causal association models (such as standard neural networks or decision trees) can be made to have intervention capabilities, so as to better handle causal relationships. 3. **Objectives**: - Transform correlation - based models into models that can incorporate the underlying causal relationships in the data. - Improve the ability of classical discriminative machine - learning models from the first level (association) to the second level (intervention) of the Pearl Causal Hierarchy (PCH). - Verify the effectiveness of the causal loss in experiments and show its application effects in different models (such as neural networks and decision trees). 4. **Contributions**: - Propose a loss function that makes correlation models more causal. - Introduce Causal Sum - Product Networks (CaSPN) and prove that they can be used as causal losses. - Show that both differentiable and non - differentiable models can benefit from the causal loss function, and that the causal loss has a strong feedback effect on training neural classifiers. - Prove that learning decision trees using causal losses can generate compact tree structures while matching the performance of highly parameterized classical decision trees. Through these improvements, the paper attempts to bridge the gap between causal models and correlation models, so that machine - learning models do not rely on simple statistical correlations when dealing with causal relationships, but instead have a deeper understanding of the causal mechanisms in the data - generation process.