Image-imperceptible Backdoor Attacks
Zhu Shuwen,Luo Ge,Wei Ping,Li Sheng,Zhang Xinpeng,Qian Zhenxing
DOI: https://doi.org/10.11834/jig.220550
2023-01-01
Journal of Image and Graphics
Abstract:ObjectiveBackdoor attack-oriented adversarial attacks can yield deep model-attacked to play well in regular,whereas behaves maliciously in terms of triggers-predefined hidden backdoors-activated. But, deep models are vulnerable against multiple adversarial attacks. The aim of backdoor attacks is oriented to penetrate the predesigned backdoor triggers into the portion of the training data(e. g., specific patterns like a square, noises, strips, or warpings). To guarantee attacking effectiveness, existing backdoor attacks are focused on assigning clean-label to the poisoned samples or hiding triggers in the poisoned data against human inspection. Nevertheless, it is still challenged to possess visual-supervised in-situ security features. To resolve this problem, we develop an imperceptible and effective backdoor attack method,which is imperceptible against human inspection, filters, and statistic detector.MethodTo generate poisoned samples, a smaller image as trigger can be embedded into image-profiling, which are mixed with clean samples as the final training data. Hiding the trigger naturally, the label-imperceptive poisoned sample is similar to the corresponding clean sample (image imperceptibility), and it can also defend the most advanced statistical analysis(statistic imperceptibility) methods.We develop a one-to-oneself attack paradigm of those class-sourced for poisoning is oriented on the target class itself. Differentiated from the previous attack paradigms( all-to-one and all-to-all), a portion of target class-derived images are selected as pre-poisoned samples. With the correct label corresponding to the target class, these images could be imperceptible in the presence of human inspection. However, the classical attack paradigms all-to-one and all-to-all are based on unmatched or error labels, and the target class cannot be sourced by itself. Human inspection-against input-label pairs-mislabeled(like bird-cat) might arouse definite suspicion, which can be used to reveal the attack. Following a filtering process, the rest of samples(most of them are clean) could invalidate the attack. We can launch a quick attack on pretrained model in terms of same data-poisoned fine-tuning. Our accuracy-consistent backdoor attack illustrates that the imperceptibility can be originated from label, image, and statistic aspects.ResultTo verify the effectiveness and invisibility of proposed method, experiments are compared to 3 kind of popular methods on ImageN et, MNIST, and CIFAR-10 datasets. For one-to-oneself attack, it can confuse the high accuracies-poisoned model through poisoning a small proportion(7%) of original clean samples on ImageN et, MNIST, and CIFAR-10. Compared to the clean model on all three datasets,the backdoor is inactivated by the trigger when clean samples are tested. There is slight decrease of poisoned accuracy,which is less than 1%. It should be noted that the label of poisoned image is changed to the target label with some backdoor attack, mislabeled input-label pairs will be detected in practice easily. Hence, we did not modify the triggers-injected label of image, while every input-label pair in the training sets of some classical methods is correct-matched. For classical all-toone attack, the proposed method could classify the same accuracy-based clean samples, and it have comparable attack success rates(more than 99%) when poisoned samples are tested. The trigger of BadN et beyond is invisible against human visual inspection. The trigger-embedded are imperceptible, and the poisoned image is natural and hard to be distinguished from the original clean image. We use learned perceptual image patch similarity( LPIPS), peak signal-to-noise ratio(PSNR), and structural similarity( SSIM) as the metrics for invisibility to quantify as well. Compared with the three methods, the mean distance between the poisoned images generated by ours and original images is almost zero with a nearzero LPIPS value. With the highest SSIM values as well on three datasets, our poisoned samples are more similar to their corresponding benign ones. Meanwhile, our attack achieves the highest PSNR values( more than 43 dB on ImageN et,MNIST, CIFAR-10). For MNIST, PSNR score can be optimized more and reached to 52. 4 dB.ConclusionAn imperceptible backdoor attack is proposed, where the poisoned image have its label-validated invisible trigger. Hidden-data based triggers are embedded in images invisibly. The poisoned images are similar to original clean ones in this way as well. The user can be imperceptive during the whole process and could not be aware of the abnormality, while other attackers cannot utilize the trigger. And, a new attack paradigm, one-to-oneself attack, is designed for clean-label backdoor attack. Specifically, the original label can keep in consistency when trigger-selected is used for poisoning the images. Behind the success of the new attack paradigm, most defenses will be invalid, which are based on the assumption that samples-poisoned may have a changed label. Finally, our backdoor attack proposed has its potentials to imperceptibility in relevant to label, image and statistic-contexts.