A Data-Driven Two-Phase Multi-Split Causal Ensemble Model for Time Series
Zhipeng Ma,Marco Kemmerling,Daniel Buschmann,Chrismarie Enslin,Daniel Lütticke,Robert H. Schmitt
DOI: https://doi.org/10.3390/sym15050982
2024-03-04
Abstract:Causal inference is a fundamental research topic for discovering the
cause-effect relationships in many disciplines. However, not all algorithms are
equally well-suited for a given dataset. For instance, some approaches may only
be able to identify linear relationships, while others are applicable for
non-linearities. Algorithms further vary in their sensitivity to noise and
their ability to infer causal information from coupled vs. non-coupled time
series. Therefore, different algorithms often generate different causal
relationships for the same input. To achieve a more robust causal inference
result, this publication proposes a novel data-driven two-phase multi-split
causal ensemble model to combine the strengths of different causality base
algorithms. In comparison to existing approaches, the proposed ensemble method
reduces the influence of noise through a data partitioning scheme in the first
phase. To achieve this, the data are initially divided into several partitions
and the base algorithms are applied to each partition. Subsequently, Gaussian
mixture models are used to identify the causal relationships derived from the
different partitions that are likely to be valid. In the second phase, the
identified relationships from each base algorithm are then merged based on
three combination rules. The proposed ensemble approach is evaluated using
multiple metrics, among them a newly developed evaluation index for causal
ensemble approaches. We perform experiments using three synthetic datasets with
different volumes and complexity, which are specifically designed to test
causality detection methods under different circumstances while knowing the
ground truth causal relationships. In these experiments, our causality ensemble
outperforms each of its base algorithms. In practical applications, the use of
the proposed method could hence lead to more robust and reliable causality
results.
Machine Learning,Artificial Intelligence,Methodology