Two-Phase Data Synthesis for Income: An Application to the NHIS

Kevin Ros,Henrik Olsson,Jingchen Hu
DOI: https://doi.org/10.48550/arXiv.2006.01686
2020-06-02
Applications
Abstract:We propose a two-phase synthesis process for synthesizing income, a sensitive variable which is usually highly-skewed and has a number of reported zeros. We consider two forms of a continuous income variable: a binary form, which is modeled and synthesized in phase 1; and a non-negative continuous form, which is modeled and synthesized in phase 2. Bayesian synthesis models are proposed for the two-phase synthesis process, and other synthesis models are readily implementable. We demonstrate our methods with applications to a sample from the National Health Interview Survey (NHIS). Utility and risk profiles of generated synthetic datasets are evaluated and compared to results from a single-phase synthesis process.
What problem does this paper attempt to address?