Learning Privacy-Preserving Student Networks via Discriminative-Generative Distillation

Shiming Ge,Bochao Liu,Pengju Wang,Yong Li,Dan Zeng
DOI: https://doi.org/10.1109/TIP.2022.3226416
2024-09-04
Abstract:While deep models have proved successful in learning rich knowledge from massive well-annotated data, they may pose a privacy leakage risk in practical deployment. It is necessary to find an effective trade-off between high utility and strong privacy. In this work, we propose a discriminative-generative distillation approach to learn privacy-preserving deep models. Our key idea is taking models as bridge to distill knowledge from private data and then transfer it to learn a student network via two streams. First, discriminative stream trains a baseline classifier on private data and an ensemble of teachers on multiple disjoint private subsets, respectively. Then, generative stream takes the classifier as a fixed discriminator and trains a generator in a data-free manner. After that, the generator is used to generate massive synthetic data which are further applied to train a variational autoencoder (VAE). Among these synthetic data, a few of them are fed into the teacher ensemble to query labels via differentially private aggregation, while most of them are embedded to the trained VAE for reconstructing synthetic data. Finally, a semi-supervised student learning is performed to simultaneously handle two tasks: knowledge transfer from the teachers with distillation on few privately labeled synthetic data, and knowledge enhancement with tangent-normal adversarial regularization on many triples of reconstructed synthetic data. In this way, our approach can control query cost over private data and mitigate accuracy degradation in a unified manner, leading to a privacy-preserving student model. Extensive experiments and analysis clearly show the effectiveness of the proposed approach.
Machine Learning,Artificial Intelligence,Cryptography and Security
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve The paper aims to address the issue of how to effectively learn a privacy-preserving deep model in real-world deployments without significantly reducing inference accuracy. Specifically, the study proposes a discriminative-generative distillation method to achieve this goal through the following three key steps: 1. **Data-Independent Generator Learning**: By training a generator to produce a large amount of synthetic data that has a similar distribution to the private data but does not leak any private information. 2. **Differential Privacy Protection**: Utilizing differential privacy techniques to provide theoretically strong privacy guarantees, limiting the student's access to the teacher's knowledge, thereby reducing the risk of privacy leakage. 3. **Tangent-Normal Adversarial Regularization**: Using a variational autoencoder (VAE) for synthetic data reconstruction and introducing perturbations in the tangent and normal directions during this process to enhance the model's robustness and generalization ability. Through these three steps, the method can protect privacy while reducing the impact of noisy labels and instability from synthetic data, thereby effectively improving model performance. Additionally, experimental results validate the effectiveness of the proposed method.