Abstract:Language-conditioned robotic learning has significantly enhanced robot adaptability by enabling a single model to execute diverse tasks in response to verbal commands. Despite these advancements, security vulnerabilities within this domain remain largely unexplored. This paper addresses this gap by proposing a novel adversarial prompt attack tailored to language-conditioned robotic models. Our approach involves crafting a universal adversarial prefix that induces the model to perform unintended actions when added to any original prompt. We demonstrate that existing adversarial techniques exhibit limited effectiveness when directly transferred to the robotic domain due to the inherent robustness of discretized robotic action spaces. To overcome this challenge, we propose to optimize adversarial prefixes based on continuous action representations, circumventing the discretization process. Additionally, we identify the beneficial impact of intermediate features on adversarial attacks and leverage the negative gradient of intermediate self-attention features to further enhance attack efficacy. Extensive experiments on VIMA models across 13 robot manipulation tasks validate the superiority of our method over existing approaches and demonstrate its transferability across different model variants.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the security issue of robot models under language conditions when facing adversarial attacks. Specifically, researchers are concerned with how to mislead robot models under language conditions to perform incorrect actions by constructing adversarial prefixes, thereby revealing and validating the security vulnerabilities of these models. The following is a summary of the core content of the paper: ### Research Background and Problems 1. **Robot Learning under Language Conditions**: This technology enables robots to perform multiple tasks according to natural language instructions, greatly improving the adaptability and flexibility of robots. 2. **Security Challenges**: Although significant progress has been made in robot learning under language conditions, its security issues have not been fully explored, especially in terms of adversarial attacks. ### Contributions of the Paper 1. **Proposing a New Adversarial Attack Framework**: For robot models under language conditions, researchers have designed a novel adversarial prefix attack method. This method can generate general - purpose adversarial prefixes, and when added to any original prompt, it will cause the model to perform wrong actions. 2. **Optimizing Continuous Behavior Vectors**: In order to bypass the robustness brought by the discretization module in the robot model, researchers choose to optimize based on continuous behavior vectors instead of directly manipulating the final probability distribution. 3. **Using Intermediate Features to Enhance the Attack Effect**: By introducing the intermediate self - attention features of the negative gradient, researchers have achieved adversarial distillation, further enhancing the attack effect. ### Experimental Verification - **Experimental Setup**: Researchers have conducted extensive experiments on the VIMA model, covering 13 different robot operation tasks. - **Comparing with Existing Methods**: The experimental results show that the new method has a significantly higher attack success rate on multiple tasks than the existing adversarial attack methods, with an average increase in the attack success rate of 7.5%. - **Visualizing the Attack Effect**: The impact of the attack on robot behavior is demonstrated through specific examples. For example, in visual manipulation tasks, the robot will grab the wrong object after being affected by the adversarial prefix. ### Formula Summary The formulas involved in the paper include: - **Discrete Loss Function**: \[ L_{\text{discrete}} = - \| \pi_\theta(p_a \oplus p, h) - a^* \|_2 \] - **Continuous Behavior Feature Loss**: \[ L_{\text{continuous}} = - \| D_c(p_a \oplus p, h) - D_c(p, h) \|_2 \] - **Self - Attention Feature Loss**: \[ L_{\text{self - attn}} = - \| F_s(p_a \oplus p, h) - F_s(p, h) \|_2 \] - **Total Loss Function**: \[ L = \alpha \cdot L_{\text{continuous}} + \beta \cdot L_{\text{self - attn}} \] ### Conclusion This paper successfully reveals the security vulnerabilities in robot models under language conditions by introducing a new adversarial attack method and provides valuable references for future security research.

Rethinking the Intermediate Features in Adversarial Attacks: Misleading Robotic Models via Adversarial Distillation

Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics

Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations

Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning

Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks

Characterizing Attacks on Deep Reinforcement Learning

Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied Models

Compromising Embodied Agents with Contextual Backdoor Attacks

Rethinking Textual Adversarial Defense for Pre-trained Language Models

Understanding Adversarial Attacks on Observations in Deep Reinforcement Learning

Adversarial Cheap Talk

A Hybrid Defense Strategy for Boosting Adversarial Robustness in Vision-Language Models

On Evaluating Adversarial Robustness of Large Vision-Language Models

Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation

Imposter.AI: Adversarial Attacks with Hidden Intentions towards Aligned Large Language Models

An LLM can Fool Itself: A Prompt-Based Adversarial Attack

Vanilla Feature Distillation for Improving the Accuracy-Robustness Trade-Off in Adversarial Training

DiffuseDef: Improved Robustness to Adversarial Attacks

Adversarial Attacks for Embodied Agents.

Adversarial Prompt Distillation for Vision-Language Models

A Prompting-based Approach for Adversarial Example Generation and Robustness Enhancement