Learning Privacy-Preserving Student Networks via Discriminative-Generative Distillation

Shiming Ge,Bochao Liu,Pengju Wang,Yong Li,Dan Zeng

DOI: https://doi.org/10.1109/TIP.2022.3226416

2024-09-04

Abstract:While deep models have proved successful in learning rich knowledge from massive well-annotated data, they may pose a privacy leakage risk in practical deployment. It is necessary to find an effective trade-off between high utility and strong privacy. In this work, we propose a discriminative-generative distillation approach to learn privacy-preserving deep models. Our key idea is taking models as bridge to distill knowledge from private data and then transfer it to learn a student network via two streams. First, discriminative stream trains a baseline classifier on private data and an ensemble of teachers on multiple disjoint private subsets, respectively. Then, generative stream takes the classifier as a fixed discriminator and trains a generator in a data-free manner. After that, the generator is used to generate massive synthetic data which are further applied to train a variational autoencoder (VAE). Among these synthetic data, a few of them are fed into the teacher ensemble to query labels via differentially private aggregation, while most of them are embedded to the trained VAE for reconstructing synthetic data. Finally, a semi-supervised student learning is performed to simultaneously handle two tasks: knowledge transfer from the teachers with distillation on few privately labeled synthetic data, and knowledge enhancement with tangent-normal adversarial regularization on many triples of reconstructed synthetic data. In this way, our approach can control query cost over private data and mitigate accuracy degradation in a unified manner, leading to a privacy-preserving student model. Extensive experiments and analysis clearly show the effectiveness of the proposed approach.

Machine Learning,Artificial Intelligence,Cryptography and Security

What problem does this paper attempt to address?

### The Problem the Paper Attempts to Solve The paper aims to address the issue of how to effectively learn a privacy-preserving deep model in real-world deployments without significantly reducing inference accuracy. Specifically, the study proposes a discriminative-generative distillation method to achieve this goal through the following three key steps: 1. **Data-Independent Generator Learning**: By training a generator to produce a large amount of synthetic data that has a similar distribution to the private data but does not leak any private information. 2. **Differential Privacy Protection**: Utilizing differential privacy techniques to provide theoretically strong privacy guarantees, limiting the student's access to the teacher's knowledge, thereby reducing the risk of privacy leakage. 3. **Tangent-Normal Adversarial Regularization**: Using a variational autoencoder (VAE) for synthetic data reconstruction and introducing perturbations in the tangent and normal directions during this process to enhance the model's robustness and generalization ability. Through these three steps, the method can protect privacy while reducing the impact of noisy labels and instability from synthetic data, thereby effectively improving model performance. Additionally, experimental results validate the effectiveness of the proposed method.

Learning Privacy-Preserving Student Networks via Discriminative-Generative Distillation

Learning Privacy-Preserving Student Networks via Discriminative-Generative Distillation

Private Knowledge Transfer via Model Distillation with Generative Adversarial Networks

PKDGAN: Private Knowledge Distillation with Generative Adversarial Networks

Privacy-Preserving Collaborative Deep Learning with Unreliable Participants.

Privacy-Preserving Student Learning with Differentially Private Data-Free Distillation

Model Conversion via Differentially Private Data-Free Distillation

Learning Differentially Private Diffusion Models via Stochastic Adversarial Distillation

Locally Differentially Private Distributed Deep Learning via Knowledge Distillation

Adversarial Distillation for Learning with Privileged Provisions

Differentially Private Knowledge Distillation via Synthetic Text Generation

Selective Knowledge Sharing for Privacy-Preserving Federated Distillation without A Good Teacher

LLM-based Privacy Data Augmentation Guided by Knowledge Distillation with a Distribution Tutor for Medical Text Classification

Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data

Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models

A Privacy Knowledge Transfer Method for Clinical Concept Extraction.

Synthesizing High-Utility Tabular Data with Enhanced Privacy Via Split-and-Discard Pre-Training

Beyond Inferring Class Representatives: User-Level Privacy Leakage From Federated Learning

Federated Synthetic Data Generation with Differential Privacy