Abstract:Privacy-preserving machine learning (PPML) enables multiple data owners to contribute their data privately to a set of servers that run a secure multi-party computation (MPC) protocol to train a joint ML model. In these protocols, the input data remains private throughout the training process, and only the resulting model is made available. While this approach benefits privacy, it also exacerbates the risks of data poisoning, where compromised data owners induce undesirable model behavior by contributing malicious datasets. Existing MPC mechanisms can mitigate certain poisoning attacks, but these measures are not exhaustive. To complement existing poisoning defenses, we introduce UTrace: a framework for User-level Traceback of poisoning attacks in PPML. Utrace computes user responsibility scores using gradient similarity metrics aggregated across the most relevant samples in an owner's dataset. UTrace is effective at low poisoning rates and is resilient to poisoning attacks distributed across multiple data owners, unlike existing unlearning-based methods. We introduce methods for checkpointing gradients with low storage overhead, enabling traceback in the absence of data owners at deployment time. We also design several optimizations that reduce traceback time and communication in MPC. We provide a comprehensive evaluation of UTrace across four datasets from three data modalities (vision, text, and malware) and show its effectiveness against 10 poisoning attacks.

What problem does this paper attempt to address?

This paper attempts to solve the source - tracing problem of data - poisoning attacks in Privacy - Preserving Machine Learning (PPML). Specifically, the authors focus on how to effectively identify and track malicious users in a private collaborative learning environment. These users affect the behavior of the training model by providing tampered data. ### Main problems 1. **Risks of data - poisoning attacks**: - In PPML, although the input data remains private throughout the training process, this also increases the risk of data - poisoning attacks. Attackers can induce the model to exhibit bad behavior by contributing malicious data sets. 2. **Limitations of existing methods**: - Although the existing multi - party secure computation (MPC) mechanisms can mitigate some poisoning attacks, these measures are not comprehensive. In particular, the existing unlearning - based methods are not effective in dealing with distributed poisoning attacks across multiple data owners. ### Solution: UTrace Framework To address the above challenges, the authors introduce the UTrace framework, a new method for user - level tracing of data - poisoning attacks. The main features of UTrace include: - **User - liability scoring**: - UTrace uses gradient similarity measures to calculate user - liability scores. Specifically, it calculates the liability score for each user by aggregating the gradient similarity scores of the most relevant samples in the user's data set. - **Checkpoint method with low storage overhead**: - In order to perform tracing during deployment without the participation of data owners, UTrace introduces new checkpoint methods, such as storing the final - layer gradients and random - gradient projections, to reduce the required storage space. - **Optimization of communication and time efficiency**: - UTrace has designed a variety of optimization techniques to reduce the number of communication rounds and running time in MPC, making it more efficient in practical applications. ### Formula representation The formulas used in the UTrace framework are as follows: 1. **Sample - level impact score \(I_{\text{cos}}\)**: \[ I_{\text{cos}}(z, \hat{z})=\sum_{t \in T} \eta_t \frac{\langle\nabla_\theta \ell(\theta_t; z), \nabla_\theta \ell(\theta_t; \hat{z})\rangle}{\|\nabla_\theta \ell(\theta_t; z)\|_2\|\nabla_\theta \ell(\theta_t; \hat{z})\|_2} \] 2. **User - level impact score \(I_{\text{s}}^{\text{cos}}\)**: \[ I_{\text{s}}^{\text{cos}}(D_i, \hat{z}, R)=\frac{1}{|D_i|} \sum_{z \in D_i} I_{\text{cos}}(z, \hat{z}) \] 3. **User - level impact score \(I_{\text{p}}^{\text{cos}}\)**: \[ I_{\text{p}}^{\text{cos}}(D_i, \hat{z}, R)=\sum_{t \in T} \eta_t \frac{\langle\nabla_\theta L(\theta_t; D_i), \nabla_\theta \ell(\theta_t; \hat{z})\rangle}{\|\nabla_\theta L(\theta_t; D_i)\|_2\|\nabla_\theta \ell(\theta_t; \hat{z})\|_2} \] 4. **Top - k selected user - liability score**

UTrace: Poisoning Forensics for Private Collaborative Learning

Poison Forensics: Traceback of Data Poisoning Attacks in Neural Networks

Hiding in Plain Sight: Differential Privacy Noise Exploitation for Evasion-resilient Localized Poisoning Attacks in Multiagent Reinforcement Learning

FL-PTD: A Privacy Preserving Defense Strategy Against Poisoning Attacks in Federated Learning.

With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models

Poisoning Prevention in Federated Learning and Differential Privacy via Stateful Proofs of Execution

On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient Shaping

FLTracer: Accurate Poisoning Attack Provenance in Federated Learning

DPFLA: Defending Private Federated Learning Against Poisoning Attacks

Leveraging MTD to Mitigate Poisoning Attacks in Decentralized FL with Non-IID Data

Potion: Towards Poison Unlearning

Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets

Partner in Crime: Boosting Targeted Poisoning Attacks against Federated Learning

Defending against Data Poisoning Attacks in Federated Learning via User Elimination

DP-Poison: Poisoning Federated Learning under the Cover of Differential Privacy

Learning to Poison Large Language Models During Instruction Tuning

Unraveling the Connections between Privacy and Certified Robustness in Federated Learning Against Poisoning Attacks

Tracing Back the Malicious Clients in Poisoning Attacks to Federated Learning

MODEL: A Model Poisoning Defense Framework for Federated Learning via Truth Discovery

Persistent Pre-Training Poisoning of LLMs