What problem does this paper attempt to address?

The main problem that this paper attempts to solve is: **Are similarity - based privacy metrics (SBPMs) sufficient to ensure that synthetic data complies with regulatory requirements?** Specifically, the paper explores the following aspects: 1. **Background and Motivation**: - Synthetic data (data generated by machine - learning generative models) is increasingly widely used outside academia, such as in releasing public census data, sharing sensitive financial and health data, etc. - Although these applications satisfy formal privacy definitions (such as Differential Privacy, DP), many research papers and companies rely on empirical similarity - based privacy metrics (SBPMs) rather than strict theoretical guarantees. 2. **Main Problem**: - The core issue of the paper is to question whether similarity - based privacy metrics are sufficient to ensure that synthetic data complies with regulatory requirements. The author believes that due to the fundamental problems and unreliable and inconsistent nature of SBPMs, they cannot ensure compliance. 3. **Specific Problems**: - **Lack of Theoretical Guarantee**: SBPMs have no clear threat model or strategic adversary, ignoring important security and regulatory principles. - **Privacy Treated as Binary Property**: SBPMs regard privacy leakage as a binary property, assuming that synthetic data sets that pass the test are safe, even if the training data needs to be queried for each release. - **Privacy Treated as Data Property**: SBPMs consider privacy as an attribute of data, rather than an attribute of the generative model/process, resulting in inconsistent results and increasing the risk of privacy leakage. - **Non - Comparative Process**: SBPMs do not compare situations with and without individual participation, making the system vulnerable to attack. - **Misinterpretation**: Test results may be misread, and failure to reject the null hypothesis does not mean that privacy is actually protected. - **Practical Problems**: Most SBPMs implementations require discretization of data, resulting in imprecise calculations and over - stating privacy protection. 4. **Counter - example Demonstration**: - The paper demonstrates the unreliability and inconsistency of SBPMs through three counter - examples, including completely leaking test data, leaking outliers in training data, etc. In summary, this paper aims to reveal the deficiencies of similarity - based privacy metrics in ensuring the regulatory compliance of synthetic data and calls for the adoption of more stringent theoretical guarantees and evaluation methods.

Synthetic Data, Similarity-based Privacy Metrics, and Regulatory (Non-)Compliance

The Inadequacy of Similarity-based Privacy Metrics: Privacy Attacks against "Truly Anonymous" Synthetic Datasets

When Synthetic Data Met Regulation

A Unified Framework for Quantifying Privacy Risk in Synthetic Data

Synthetic Data: Revisiting the Privacy-Utility Trade-off

Synthetic Data: Methods, Use Cases, and Risks

Comparative Study of Differentially Private Synthetic Data Algorithms from the NIST PSCR Differential Privacy Synthetic Data Challenge

On Utility and Privacy in Synthetic Genomic Data

Privacy risk from synthetic data: practical proposals

Practical privacy metrics for synthetic data

Metric geometry of the privacy-utility tradeoff

Fidelity and Privacy of Synthetic Medical Data

Real Risks of Fake Data: Synthetic Data, Diversity-Washing and Consent Circumvention

Strong statistical parity through fair synthetic data

Synthetic Data Outliers: Navigating Identity Disclosure

Privacy Risk Assessment for Synthetic Longitudinal Health Data

The Real Deal Behind the Artificial Appeal: Inferential Utility of Tabular Synthetic Data

Evaluating Differentially Private Synthetic Data Generation in High-Stakes Domains

Advancing microdata privacy protection: A review of synthetic data methods