Abstract:The recent introduction of large language models (LLMs) has revolutionized the field of robotics by enabling contextual reasoning and intuitive human-robot interaction in domains as varied as manipulation, locomotion, and self-driving vehicles. When viewed as a stand-alone technology, LLMs are known to be vulnerable to jailbreaking attacks, wherein malicious prompters elicit harmful text by bypassing LLM safety guardrails. To assess the risks of deploying LLMs in robotics, in this paper, we introduce RoboPAIR, the first algorithm designed to jailbreak LLM-controlled robots. Unlike existing, textual attacks on LLM chatbots, RoboPAIR elicits harmful physical actions from LLM-controlled robots, a phenomenon we experimentally demonstrate in three scenarios: (i) a white-box setting, wherein the attacker has full access to the NVIDIA Dolphins self-driving LLM, (ii) a gray-box setting, wherein the attacker has partial access to a Clearpath Robotics Jackal UGV robot equipped with a GPT-4o planner, and (iii) a black-box setting, wherein the attacker has only query access to the GPT-3.5-integrated Unitree Robotics Go2 robot dog. In each scenario and across three new datasets of harmful robotic actions, we demonstrate that RoboPAIR, as well as several static baselines, finds jailbreaks quickly and effectively, often achieving 100% attack success rates. Our results reveal, for the first time, that the risks of jailbroken LLMs extend far beyond text generation, given the distinct possibility that jailbroken robots could cause physical damage in the real world. Indeed, our results on the Unitree Go2 represent the first successful jailbreak of a deployed commercial robotic system. Addressing this emerging vulnerability is critical for ensuring the safe deployment of LLMs in robotics. Additional media is available at: <a class="link-external link-https" href="https://robopair.org" rel="external noopener nofollow">this https URL</a>

BadRobot: Jailbreaking Embodied LLMs in the Physical World

Jailbreaking LLM-Controlled Robots

Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-based Decision-Making Systems

Compromising Embodied Agents with Contextual Backdoor Attacks

Evil Geniuses: Delving into the Safety of LLM-based Agents

A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents

Open the Pandora's Box of LLMs: Jailbreaking LLMs Through Representation Engineering

SafeEmbodAI: a Safety Framework for Mobile Robots in Embodied AI Systems

Highlighting the Safety Concerns of Deploying LLMs/VLMs in Robotics

Distract Large Language Models for Automatic Jailbreak Attack

h4rm3l: A Dynamic Benchmark of Composable Jailbreak Attacks for LLM Safety Assessment

Playing Language Game with LLMs Leads to Jailbreaking

BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents

Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied Models

Open Sesame! Universal Black Box Jailbreaking of Large Language Models

Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction

Cognitive Overload: Jailbreaking Large Language Models with Overloaded Logical Thinking

JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models

Efficient LLM-Jailbreaking by Introducing Visual Modality

PathSeeker: Exploring LLM Security Vulnerabilities with a Reinforcement Learning-Based Jailbreak Approach

Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification