CAT: Concept-level backdoor ATtacks for Concept Bottleneck Models

Songning Lai,Jiayu Yang,Yu Huang,Lijie Hu,Tianlang Xue,Zhangyi Hu,Jiaxu Li,Haicheng Liao,Yutao Yue
2024-10-07
Abstract:Despite the transformative impact of deep learning across multiple domains, the inherent opacity of these models has driven the development of Explainable Artificial Intelligence (XAI). Among these efforts, Concept Bottleneck Models (CBMs) have emerged as a key approach to improve interpretability by leveraging high-level semantic information. However, CBMs, like other machine learning models, are susceptible to security threats, particularly backdoor attacks, which can covertly manipulate model behaviors. Understanding that the community has not yet studied the concept level backdoor attack of CBM, because of "Better the devil you know than the devil you don't know.", we introduce CAT (Concept-level Backdoor ATtacks), a methodology that leverages the conceptual representations within CBMs to embed triggers during training, enabling controlled manipulation of model predictions at inference time. An enhanced attack pattern, CAT+, incorporates a correlation function to systematically select the most effective and stealthy concept triggers, thereby optimizing the attack's impact. Our comprehensive evaluation framework assesses both the attack success rate and stealthiness, demonstrating that CAT and CAT+ maintain high performance on clean data while achieving significant targeted effects on backdoored datasets. This work underscores the potential security risks associated with CBMs and provides a robust testing methodology for future security assessments.
Computer Vision and Pattern Recognition,Cryptography and Security
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the issue of concept-level backdoor attacks in Concept Bottleneck Models (CBMs). Specifically, the paper focuses on the following points: 1. **Concealment of Concept-level Backdoor Attacks**: - Traditional backdoor attacks typically embed obvious patterns or patches in images, which can be easily identified through visual inspection. However, CBMs rely on high-level concept representations, allowing for more covert attacks at the concept level. - The paper proposes the CAT (Concept-level Backdoor ATtacks) method, which utilizes the concept space in CBMs to embed triggers during training, thereby manipulating model predictions during inference while maintaining a low probability of detection. 2. **Enhancing Attack Effectiveness**: - The paper further introduces an enhanced version, CAT+, which systematically selects the most effective and covert concept triggers by incorporating related functions to optimize attack effectiveness. - CAT+ employs an iterative poisoning strategy to gradually select and update concept triggers, significantly improving the concealment and effectiveness of the attack. 3. **Evaluating Attack Success Rate and Concealment**: - The paper constructs a comprehensive evaluation framework to assess CAT and CAT+ from the perspectives of attack success rate and concealment. - Experimental results show that CAT and CAT+ maintain high performance on clean datasets while achieving significant target effects on datasets subjected to backdoor attacks. 4. **Highlighting Potential Security Risks**: - The paper points out that although CBMs have advantages in improving model interpretability, they also face security threats, particularly backdoor attacks. - By studying concept-level backdoor attacks, a better understanding of their mechanisms can be achieved, and effective defense measures can be developed, especially in critical decision-making tasks such as those in the medical, financial, and national security fields. ### Summary This paper systematically explores concept-level backdoor attacks in CBMs by introducing the CAT and CAT+ methods, filling a research gap in this area. The paper not only demonstrates the effectiveness and concealment of these attacks but also provides a comprehensive evaluation framework, offering important references for future research and applications.