A novel and fully automated platform for synthetic tabular data generation and validation

Hooman H. Rashidi,Samer Albahra,Brian P. Rubin,Bo Hu
DOI: https://doi.org/10.1038/s41598-024-73608-0
IF: 4.6
2024-10-09
Scientific Reports
Abstract:Healthcare data accessibility for machine learning (ML) is encumbered by a range of stringent regulations and limitations. Using synthetic data that mirrors the underlying properties in the real data is emerging as a promising solution to overcome these barriers. We propose a fully automated synthetic tabular neural generator (STNG), which comprises multiple synthetic data generators and integrates an Auto-ML module to validate and comprehensively compare the synthetic datasets generated from different approaches. An empirical study was conducted to demonstrate the performance of STNG using twelve different datasets. The results highlight STNG's robustness and its pivotal role in enhancing the accessibility of validated synthetic healthcare data, thereby offering a promising solution to a critical barrier in ML applications in healthcare.
multidisciplinary sciences
What problem does this paper attempt to address?