Accumulative Poisoning Attacks on Real-time Data

Tianyu Pang,Xiao Yang,Yinpeng Dong,Hang Su,Jun Zhu
DOI: https://doi.org/10.48550/arXiv.2106.09993
2021-10-26
Abstract:Collecting training data from untrusted sources exposes machine learning services to poisoning adversaries, who maliciously manipulate training data to degrade the model accuracy. When trained on offline datasets, poisoning adversaries have to inject the poisoned data in advance before training, and the order of feeding these poisoned batches into the model is stochastic. In contrast, practical systems are more usually trained/fine-tuned on sequentially captured real-time data, in which case poisoning adversaries could dynamically poison each data batch according to the current model state. In this paper, we focus on the real-time settings and propose a new attacking strategy, which affiliates an accumulative phase with poisoning attacks to secretly (i.e., without affecting accuracy) magnify the destructive effect of a (poisoned) trigger batch. By mimicking online learning and federated learning on MNIST and CIFAR-10, we show that model accuracy significantly drops by a single update step on the trigger batch after the accumulative phase. Our work validates that a well-designed but straightforward attacking strategy can dramatically amplify the poisoning effects, with no need to explore complex techniques.
Machine Learning,Cryptography and Security,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the problem of poisoning attacks faced by machine - learning models in real - time data streams. Specifically, the paper focuses on how attackers can reduce the accuracy of the model by dynamically injecting malicious data into the training data in a real - time data stream environment. Different from poisoning attacks in offline datasets, in real - time data streams, attackers can dynamically adjust their attack strategies during the model training process and poison each data batch according to the current model state. #### Main research questions 1. **Challenges of poisoning attacks in real - time data streams**: - In a real - time data stream environment, attackers can dynamically adjust the poisoned data according to the model state, which makes traditional poisoning attack methods no longer applicable. - To meet this challenge, the paper proposes a new attack strategy - accumulative poisoning attacks - to amplify the destructive effect of a single trigger batch. 2. **Mechanism of accumulative poisoning attacks**: - The accumulative poisoning attack makes the model state sensitive to a specific trigger batch through an accumulative phase, thereby significantly reducing the model's accuracy after a single update step. - This accumulative phase does not affect the model's accuracy to avoid being detected by the monitoring system, thus ensuring the stealth of the attack. 3. **Experimental verification**: - The paper verifies the effectiveness of the accumulative poisoning attack by simulating the online learning and federated learning processes on the MNIST and CIFAR - 10 datasets. - The experimental results show that after the accumulative phase, the model's accuracy can drop sharply from 82.09% to 27.66% with just one update step. #### Formula representation The objective function of the accumulative poisoning attack can be represented as: \[ \min_{P,A} \nabla_\theta L(S_{val}; A(\theta_T))^\top \nabla_\theta L(P(S_T); A(\theta_T)) \] where: - \( L(S_{val}; A(\theta_T)) \) is the loss function on the validation set, and \( A(\theta_T) \) represents the model parameters after the accumulative phase. - \( P(S_T) \) is the poisoned trigger batch. - \( \nabla_\theta L(S_{val}; A(\theta_T)) \) and \( \nabla_\theta L(P(S_T); A(\theta_T)) \) are the gradients of the validation set and the trigger batch respectively. By optimizing the above objective function, the accumulative poisoning attack can make the model sensitive to a specific trigger batch without affecting the model's accuracy, so that it can quickly collapse when triggered. #### Conclusion By proposing the accumulative poisoning attack strategy, the paper shows that in a real - time data stream environment, attackers can amplify the effect of poisoning attacks through ingenious design. This finding emphasizes the importance of protecting machine - learning models from poisoning attacks in real - time data streams and provides new ideas for future defense mechanisms.