Class-Targeted Poisoning Attacks Against DNNs

Jian Chen,Jingyao Wu,Hao Yin,Qiang Li,Wensheng Zhang,Chen Wang
DOI: https://doi.org/10.1109/trustcom60117.2023.00026
2024-01-01
Abstract:In recent years, the emergence of targeted cleanlabel poisoning attacks, which maliciously influence the training data without controlling over the labeling process to manipulate the behavior of the predictive model, is shown to be crucial threats to compromise deep learning systems. Prior targeted clean-label poisoning attacks have been demonstrated to target only one sample at a time, which is not always applicable in restricted real-world situations. In this paper, we explore targeted clean-label poisoning attacks on a per-class basis, which refers to misclassify samples from a victim class to the desired class while maintaining the classification accuracy of samples on other classes in multi-class classification tasks. To achieve this, we present the first class-targeted clean-label poisoning attack, called CTCL, which firsts craft clean label poisons along with multiple directions in the target feature space and enhance the attacking capability of poisons by reducing their the feature information of the target class. We illustrate the effectiveness of the proposed CTCL on various deep neural network models. The experiment results demonstrate that our attack is effective, with the attacking success rate over 80% compared to the other two baseline attacks on average, while the detection accuracy of state-of-the-art defenses is lower than 65% illustrating that CTCL can escape the detection of existing defenses readily.
What problem does this paper attempt to address?