Abstract:Label differential privacy (DP) is a framework that protects the privacy of labels in training datasets, while the feature vectors are public. Existing approaches protect the privacy of labels by flipping them randomly, and then train a model to make the output approximate the privatized label. However, as the number of classes $K$ increases, stronger randomization is needed, thus the performances of these methods become significantly worse. In this paper, we propose a vector approximation approach, which is easy to implement and introduces little additional computational overhead. Instead of flipping each label into a single scalar, our method converts each label into a random vector with $K$ components, whose expectations reflect class conditional probabilities. Intuitively, vector approximation retains more information than scalar labels. A brief theoretical analysis shows that the performance of our method only decays slightly with $K$. Finally, we conduct experiments on both synthesized and real datasets, which validate our theoretical analysis as well as the practical performance of our method.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that under the Label Differential Privacy (Label DP) framework, when the number of classes $K$ is large, the performance of existing methods drops significantly. Specifically: 1. **Background and Problem Description**: - In supervised learning, label differential privacy aims to protect the privacy of labels in the training dataset, while the feature vectors are public. - Existing methods protect privacy by randomly flipping labels, but as the number of classes $K$ increases, stronger randomization is required, resulting in a significant drop in model performance. 2. **Limitations of Existing Methods**: - Existing methods such as Randomized Response, RRWithPrior, and ALIBI protect privacy by converting labels into a single scalar. However, in the multi - class case, the performance of these methods drops sharply as $K$ increases. - From an information - theoretic perspective, a single scalar can only convey limited information. Therefore, as $K$ increases, it becomes increasingly difficult to maintain the statistical dependence between the original labels and the privatized labels, resulting in a drop in model performance. 3. **Method Proposed in the Paper**: - The authors propose a label differential privacy method based on vector approximation. Specifically, each label is converted into a random vector $Z=(Z(1),\dots,Z(K))\in \{0, 1\}^K$, where the expectation of $Z(j)$ reflects the conditional class probability. - This method retains more information, especially when $K$ is large, and thus can achieve better performance. 4. **Theoretical Analysis and Experimental Verification**: - The paper provides a brief theoretical analysis, indicating that the performance of this method will only decline slightly as $K$ increases. - The experimental results on synthetic data and standard benchmark datasets verify the validity of the theoretical analysis, showing that this method is significantly superior to existing methods when $K$ is large. In summary, the main contribution of this paper is to propose a new label differential privacy method based on vector approximation, which solves the problem of significant performance degradation of existing methods in multi - class classification tasks, and verifies its effectiveness both theoretically and experimentally.

Enhancing Learning with Label Differential Privacy by Vector Approximation

DPMLBench: Holistic Evaluation of Differentially Private Machine Learning

Label differential privacy via clustering

Does Label Differential Privacy Prevent Label Inference Attacks?

Differentially Private Variational Inference for Non-conjugate Models

DPCL: Contrastive Representation Learning with Differential Privacy

Label Differential Privacy via Aggregation

Differentially Private Random Feature Model

Optimal Differentially Private Model Training with Public Data

Spectral-DP: Differentially Private Deep Learning through Spectral Perturbation and Filtering

Differentially Private Knowledge Distillation via Synthetic Text Generation

Differential Privacy With Variant-Noise For Gaussian Processes Classification

Directional Privacy for Deep Learning

Improving the Privacy and Practicality of Objective Perturbation for Differentially Private Linear Learners

Learning with User-Level Local Differential Privacy

Improving Differentially Private Models with Active Learning

PrivBV: Distance-aware Encoding for Distributed Data with Local Differential Privacy

A PATE-based Approach for Training Graph Neural Networks under Label Differential Privacy

Differentially Private Support Vector Machines with Knowledge Aggregation

Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent

Differentially Private Reward Estimation with Preference Feedback