Abstract:This study probes the vulnerabilities of cooperative multi-agent reinforcement learning (c-MARL) under adversarial attacks, a critical determinant of c-MARL's worst-case performance prior to real-world implementation. Current observation-based attacks, constrained by white-box assumptions, overlook c-MARL's complex multi-agent interactions and cooperative objectives, resulting in impractical and limited attack capabilities. To address these shortcomes, we propose Adversarial Minority Influence (AMI), a practical and strong for c-MARL. AMI is a practical black-box attack and can be launched without knowing victim parameters. AMI is also strong by considering the complex multi-agent interaction and the cooperative goal of agents, enabling a single adversarial agent to unilaterally misleads majority victims to form targeted worst-case cooperation. This mirrors minority influence phenomena in social psychology. To achieve maximum deviation in victim policies under complex agent-wise interactions, our unilateral attack aims to characterize and maximize the impact of the adversary on the victims. This is achieved by adapting a unilateral agent-wise relation metric derived from mutual information, thereby mitigating the adverse effects of victim influence on the adversary. To lead the victims into a jointly detrimental scenario, our targeted attack deceives victims into a long-term, cooperatively harmful situation by guiding each victim towards a specific target, determined through a trial-and-error process executed by a reinforcement learning agent. Through AMI, we achieve the first successful attack against real-world robot swarms and effectively fool agents in simulated environments into collectively worst-case scenarios, including Starcraft II and Multi-agent Mujoco. The source code and demonstrations can be found at: <a class="link-external link-https" href="https://github.com/DIG-Beihang/AMI" rel="external noopener nofollow">this https URL</a>.

BLAST: A Stealthy Backdoor Leverage Attack against Cooperative Multi-Agent Deep Reinforcement Learning based Systems

MARNet: Backdoor Attacks Against Cooperative Multi-Agent Reinforcement Learning

A Spatiotemporal Stealthy Backdoor Attack against Cooperative Multi-Agent Deep Reinforcement Learning

B3: Backdoor Attacks Against Black-box Machine Learning Models

Adversarial attacks on cooperative multi-agent deep reinforcement learning: a dynamic group-based adversarial example transferability method

BadRL: Sparse Targeted Backdoor Attack Against Reinforcement Learning

Backdoors Stuck At The Frontdoor: Multi-Agent Backdoor Attacks That Backfire

Adversarial Attacks on Multiagent Deep Reinforcement Learning Models in Continuous Action Space

Attacking Cooperative Multi-Agent Reinforcement Learning by Adversarial Minority Influence

Mitigating Deep Reinforcement Learning Backdoors in the Neural Activation Space

Cooperative Backdoor Attack in Decentralized Reinforcement Learning with Theoretical Guarantee

On the Robustness of Cooperative Multi-Agent Reinforcement Learning

Spatiotemporally Constrained Action Space Attacks on Deep Reinforcement Learning Agents

CuDA2: An approach for Incorporating Traitor Agents into Cooperative Multi-Agent Systems

Camouflage Adversarial Attacks on Multiple Agent Systems

Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-based Decision-Making Systems

Adversarial Attacks on Reinforcement Learning Agents for Command and Control

BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents

Sparse Adversarial Attack in Multi-agent Reinforcement Learning

PolicyCleanse: Backdoor Detection and Mitigation in Reinforcement Learning

A Temporal-Pattern Backdoor Attack to Deep Reinforcement Learning