Validation Assessment of Privacy‐Preserving Synthetic Electronic Health Record Data: Comparison of Original Versus Synthetic Data on Real‐World COVID‐19 Vaccine Effectiveness

Echo Wang,Katrina Mott,Hongtao Zhang,Sivan Gazit,Gabriel Chodick,Mehmet Burcu
DOI: https://doi.org/10.1002/pds.70019
2024-10-09
Pharmacoepidemiology and Drug Safety
Abstract:Purpose To assess the validity of privacy‐preserving synthetic data by comparing results from synthetic versus original EHR data analysis. Methods A published retrospective cohort study on real‐world effectiveness of COVID‐19 vaccines by Maccabi Healthcare Services in Israel was replicated using synthetic data generated from the same source, and the results were compared between synthetic versus original datasets. The endpoints included COVID‐19 infection, symptomatic COVID‐19 infection and hospitalization due to infection and were also assessed in several demographic and clinical subgroups. In comparing synthetic versus original data estimates, several metrices were utilized: standardized mean differences (SMD), decision agreement, estimate agreement, confidence interval overlap, and Wald test. Synthetic data were generated five times to assess the stability of results. Results The distribution of demographic and clinical characteristics demonstrated very small difference (
pharmacology & pharmacy,public, environmental & occupational health
What problem does this paper attempt to address?