Spy-Watermark: Robust Invisible Watermarking for Backdoor Attack

Ruofei Wang,Renjie Wan,Zongyu Guo,Qing Guo,Rui Huang
2024-01-04
Abstract:Backdoor attack aims to deceive a victim model when facing backdoor instances while maintaining its performance on benign data. Current methods use manual patterns or special perturbations as triggers, while they often overlook the robustness against data corruption, making backdoor attacks easy to defend in practice. To address this issue, we propose a novel backdoor attack method named Spy-Watermark, which remains effective when facing data collapse and backdoor defense. Therein, we introduce a learnable watermark embedded in the latent domain of images, serving as the trigger. Then, we search for a watermark that can withstand collapse during image decoding, cooperating with several anti-collapse operations to further enhance the resilience of our trigger against data corruption. Extensive experiments are conducted on CIFAR10, GTSRB, and ImageNet datasets, demonstrating that Spy-Watermark overtakes ten state-of-the-art methods in terms of robustness and stealthiness.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily addresses the issue of trigger robustness in backdoor attacks. Specifically: 1. **Background**: - Backdoor attacks aim to deceive the victim model into processing backdoor instances while maintaining its performance on normal data. - Current methods typically use manual patterns or special perturbations as triggers, but these methods often overlook robustness to data corruption, making backdoor attacks easily defensible in practice. 2. **Problem**: - Existing backdoor attack methods perform poorly when faced with data corruption, leading to triggers being easily removed or defended against. - There is a need for a new backdoor attack method that remains effective in the face of data corruption and backdoor defenses. 3. **Solution**: - A new method called Spy-Watermark is proposed, embedding a learnable watermark in the latent domain of images as a trigger. - A series of anti-crash operations are employed to enhance the robustness of the trigger against data corruption. - Experimental results show that Spy-Watermark surpasses ten state-of-the-art methods in terms of robustness and stealthiness. In summary, the paper aims to improve the robustness of backdoor attack triggers, ensuring their effectiveness against various data corruptions and defense measures.