CAT: Concept-level backdoor ATtacks for Concept Bottleneck Models

Songning Lai,Jiayu Yang,Yu Huang,Lijie Hu,Tianlang Xue,Zhangyi Hu,Jiaxu Li,Haicheng Liao,Yutao Yue

2024-10-07

Abstract:Despite the transformative impact of deep learning across multiple domains, the inherent opacity of these models has driven the development of Explainable Artificial Intelligence (XAI). Among these efforts, Concept Bottleneck Models (CBMs) have emerged as a key approach to improve interpretability by leveraging high-level semantic information. However, CBMs, like other machine learning models, are susceptible to security threats, particularly backdoor attacks, which can covertly manipulate model behaviors. Understanding that the community has not yet studied the concept level backdoor attack of CBM, because of "Better the devil you know than the devil you don't know.", we introduce CAT (Concept-level Backdoor ATtacks), a methodology that leverages the conceptual representations within CBMs to embed triggers during training, enabling controlled manipulation of model predictions at inference time. An enhanced attack pattern, CAT+, incorporates a correlation function to systematically select the most effective and stealthy concept triggers, thereby optimizing the attack's impact. Our comprehensive evaluation framework assesses both the attack success rate and stealthiness, demonstrating that CAT and CAT+ maintain high performance on clean data while achieving significant targeted effects on backdoored datasets. This work underscores the potential security risks associated with CBMs and provides a robust testing methodology for future security assessments.

Computer Vision and Pattern Recognition,Cryptography and Security

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the issue of concept-level backdoor attacks in Concept Bottleneck Models (CBMs). Specifically, the paper focuses on the following points: 1. **Concealment of Concept-level Backdoor Attacks**: - Traditional backdoor attacks typically embed obvious patterns or patches in images, which can be easily identified through visual inspection. However, CBMs rely on high-level concept representations, allowing for more covert attacks at the concept level. - The paper proposes the CAT (Concept-level Backdoor ATtacks) method, which utilizes the concept space in CBMs to embed triggers during training, thereby manipulating model predictions during inference while maintaining a low probability of detection. 2. **Enhancing Attack Effectiveness**: - The paper further introduces an enhanced version, CAT+, which systematically selects the most effective and covert concept triggers by incorporating related functions to optimize attack effectiveness. - CAT+ employs an iterative poisoning strategy to gradually select and update concept triggers, significantly improving the concealment and effectiveness of the attack. 3. **Evaluating Attack Success Rate and Concealment**: - The paper constructs a comprehensive evaluation framework to assess CAT and CAT+ from the perspectives of attack success rate and concealment. - Experimental results show that CAT and CAT+ maintain high performance on clean datasets while achieving significant target effects on datasets subjected to backdoor attacks. 4. **Highlighting Potential Security Risks**: - The paper points out that although CBMs have advantages in improving model interpretability, they also face security threats, particularly backdoor attacks. - By studying concept-level backdoor attacks, a better understanding of their mechanisms can be achieved, and effective defense measures can be developed, especially in critical decision-making tasks such as those in the medical, financial, and national security fields. ### Summary This paper systematically explores concept-level backdoor attacks in CBMs by introducing the CAT and CAT+ methods, filling a research gap in this area. The paper not only demonstrates the effectiveness and concealment of these attacks but also provides a comprehensive evaluation framework, offering important references for future research and applications.

CAT: Concept-level backdoor ATtacks for Concept Bottleneck Models

B3: Backdoor Attacks Against Black-box Machine Learning Models

Guarding the Gate: ConceptGuard Battles Concept-Level Backdoors in Concept Bottleneck Models

ATTEQ-NN: Attention-based QoE-aware Evasive Backdoor Attacks.

Understanding and Enhancing Robustness of Concept-Based Models

Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats

BELT: Old-School Backdoor Attacks can Evade the State-of-the-Art Defense with Backdoor Exclusivity Lifting

Enhanced Coalescence Backdoor Attack Against DNN Based on Pixel Gradient

Composite Backdoor Attacks Against Large Language Models

An Effective and Resilient Backdoor Attack Framework against Deep Neural Networks and Vision Transformers

SATBA: An Invisible Backdoor Attack Based On Spatial Attention

Multi-target Backdoor Attacks for Code Pre-trained Models

Stealthy Targeted Backdoor Attacks against Image Captioning

Backdoor Attacks with Wavelet Embedding: Revealing and enhancing the insights of vulnerabilities in visual object detection models on transformers within digital twin systems

Dynamic Backdoor Attacks Against Machine Learning Models

SABER: Model-agnostic Backdoor Attack on Chain-of-Thought in Neural Code Generation

Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models

CAMH: Advancing Model Hijacking Attack in Machine Learning

Clean-Label Backdoor Attacks on Video Recognition Models

DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks