Advancing microdata privacy protection: A review of synthetic data methods

Jingchen Hu,Claire McKay Bowen
DOI: https://doi.org/10.1002/wics.1636
2023-11-14
WIREs Computational Statistics
Abstract:Risk‐utility tradeoff of synthetic data. Synthetic data generation is a powerful tool for privacy protection when considering public release of record‐level data files. Initially proposed about three decades ago, it has generated significant research and application interest. To meet the pressing demand of data privacy protection in a variety of contexts, the field needs more researchers and practitioners. This review provides a comprehensive introduction to synthetic data, including technical details of their generation and evaluation. Our review also addresses the challenges and limitations of synthetic data, discusses practical applications, and provides thoughts for future work. This article is categorized under: Statistical and Graphical Methods of Data Analysis > Modeling Methods and Algorithms
What problem does this paper attempt to address?