Abstract:Practitioners commonly download pretrained machine learning models from open repositories and finetune them to fit specific applications. We show that this practice introduces a new risk of privacy backdoors. By tampering with a pretrained model's weights, an attacker can fully compromise the privacy of the finetuning data. We show how to build privacy backdoors for a variety of models, including transformers, which enable an attacker to reconstruct individual finetuning samples, with a guaranteed success! We further show that backdoored models allow for tight privacy attacks on models trained with differential privacy (DP). The common optimistic practice of training DP models with loose privacy guarantees is thus insecure if the model is not trusted. Overall, our work highlights a crucial and overlooked supply chain attack on machine learning privacy.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: In the modern machine - learning supply chain, sharing and fine - tuning of pre - trained models have become common practices, but this trend has brought new security issues, especially supply - chain - based attacks. This paper focuses specifically on privacy vulnerabilities and introduces the concept of **privacy backdoors**, that is, malicious providers endanger the privacy of future fine - tuning data by tampering with the weights of pre - trained models. Specifically, the paper shows how to construct privacy backdoors so that attackers can reconstruct specific fine - tuning samples, and these backdoors can even carry out precise privacy attacks on models trained with differential privacy (DP). This indicates that if the model is untrusted, it is not safe to train DP models even with loose privacy guarantees. ### Main contributions of the paper: 1. **Design and implementation of privacy backdoors**: - A single - use backdoor design is proposed to ensure that once the backdoor is activated and data points are written into the model weights, the backdoor becomes ineffective, preventing further modification of these weights in subsequent training. - This design is similar to a latch in digital memory. Once data is written into the memory, it remains unchanged until the end of training. 2. **Attacks on different models**: - These attacks are applied to multi - layer perceptrons (MLP) and pre - trained Transformer models (such as ViT and BERT), and multiple fine - tuning samples are successfully reconstructed. - Under the black - box threat model, even if the attacker has only the right to query the fine - tuning model, the entire training input can be recovered. 3. **Attacks on the differential - privacy SGD algorithm**: - Using privacy backdoors, the first end - to - end attack on the differential - privacy SGD algorithm proposed by Abadi et al. (2016) is constructed. - The attacker's privacy leakage almost reaches the theoretical upper limit of the algorithm's privacy analysis, challenging the assumption that the privacy guarantees of DP - SGD are too conservative in practice. ### Key techniques in the paper: - **Data Traps**: By setting specific weights and biases in the model, certain input data are captured and written into the model weights during the fine - tuning process. - **Gradient Amplification**: By adding an amplification module in the model, it is ensured that the captured data points generate sufficient gradient signals during the back - propagation process, thereby closing the backdoor. - **Feature Separation**: The internal features of the model are divided into three parts: benign features (for downstream tasks), capture keys (for storing information to be captured), and activation signals (for propagating backdoor outputs). - **Numerical Tricks**: Technical challenges brought by the GELU activation function and layer normalization in Transformer models are dealt with to ensure that the backdoor signal does not disappear or explode during the training process. ### Conclusion: This paper reveals a new attack vector in the modern machine - learning supply chain and emphasizes the need for stricter privacy protection measures when using untrusted shared models. Through detailed technical design and experimental verification, the authors show the powerful attack capabilities of privacy backdoors and pose new challenges to existing privacy protection mechanisms.

Privacy Backdoors: Stealing Data with Corrupted Pretrained Models

Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models

PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps

Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage

Adversarial for Good – Defending Training Data Privacy with Adversarial Attack Wisdom

Private Knowledge Transfer via Model Distillation with Generative Adversarial Networks

Data Stealing Attacks against Large Language Models via Backdooring

TMI! Finetuned Models Leak Private Information from their Pretraining Data

Defending Our Privacy With Backdoors

Seeing the Forest through the Trees: Data Leakage from Partial Transformer Gradients

How Does a Deep Learning Model Architecture Impact Its Privacy? A Comprehensive Study of Privacy Attacks on CNNs and Transformers

Security and Privacy Challenges in Deep Learning Models

Inside the Black Box: Detecting Data Leakage in Pre-trained Language Encoders

Does Differential Privacy Prevent Backdoor Attacks in Practice?

Stand-in Backdoor: A Stealthy and Powerful Backdoor Attack

SecretGen: Privacy Recovery on Pre-Trained Models via Distribution Discrimination

Privacy attacks against deep learning models and their countermeasures

Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks

Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing

Hidden Data Privacy Breaches in Federated Learning

Privacy Risks of Securing Machine Learning Models against Adversarial Examples