On the causality-preservation capabilities of generative modelling

Yves-Cédric Bauwelinckx,Jan Dhaene,Milan van den Heuvel,Tim Verdonck
DOI: https://doi.org/10.1016/j.cam.2024.116312
IF: 2.872
2024-10-11
Journal of Computational and Applied Mathematics
Abstract:Modelling is essential in both the financial and insurance industries. The emergence of machine learning and deep learning models offers new tools for this, but they often require large datasets that are typically unavailable in business fields due to privacy and ethical concerns. This lack of data is currently one of the main hurdles in developing better models. Generative modelling, such as Generative Adversarial Networks (GANs), can address this issue by creating synthetic data that can be freely shared. While GANs are widely studied in fields like computer vision, their use in business is limited, primarily because business questions often focus on identifying causal effects, whereas GANs and neural networks typically emphasize high-dimensional correlations. This paper explores whether GANs can produce synthetic data that reliably answers causal questions by performing causal analyses on GAN-generated data under varying assumptions. The study includes cross-sectional, time series, and complete structural model scenarios. Findings show that while basic GANs replicate causal relationships in simple cross-sectional data, they struggle with more complex structural models. In contrast, CausalGAN effectively replicates the original causal model, and TimeGAN modifies the causal representation in time series data.
mathematics, applied
What problem does this paper attempt to address?