Abstract:Reinforcement learning agents are susceptible to evasion attacks during deployment. In single-agent environments, these attacks can occur through imperceptible perturbations injected into the inputs of the victim policy network. In multi-agent environments, an attacker can manipulate an adversarial opponent to influence the victim policy's observations indirectly. While adversarial policies offer a promising technique to craft such attacks, current methods are either sample-inefficient due to poor exploration strategies or require extra surrogate model training under the black-box assumption. To address these challenges, in this paper, we propose Intrinsically Motivated Adversarial Policy (IMAP) for efficient black-box adversarial policy learning in both single- and multi-agent environments. We formulate four types of adversarial intrinsic regularizers -- maximizing the adversarial state coverage, policy coverage, risk, or divergence -- to discover potential vulnerabilities of the victim policy in a principled way. We also present a novel bias-reduction method to balance the extrinsic objective and the adversarial intrinsic regularizers adaptively. Our experiments validate the effectiveness of the four types of adversarial intrinsic regularizers and the bias-reduction method in enhancing black-box adversarial policy learning across a variety of environments. Our IMAP successfully evades two types of defense methods, adversarial training and robust regularizer, decreasing the performance of the state-of-the-art robust WocaR-PPO agents by 34\%-54\% across four single-agent tasks. IMAP also achieves a state-of-the-art attacking success rate of 83.91\% in the multi-agent game YouShallNotPass. Our code is available at \url{<a class="link-external link-https" href="https://github.com/x-zheng16/IMAP" rel="external noopener nofollow">this https URL</a>}.

Curriculum Adversarial Training for Robust Reinforcement Learning

Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space

Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations

Robustifying Reinforcement Learning Agents via Action Space Adversarial Training

Robust Adaptive Ensemble Adversary Reinforcement Learning

Robust Adversarial Reinforcement Learning with Dissipation Inequation Constraint

Robust Deep Reinforcement Learning with Adversarial Attacks

Learning Robust Policies via Interpretable Hamilton-Jacobi Reachability-Guided Disturbances

Robust Safe Reinforcement Learning under Adversarial Disturbances

Efficient Adversarial Training without Attacking: Worst-Case-Aware Robust Reinforcement Learning

Adversary Agnostic Robust Deep Reinforcement Learning

Improving Robustness of Reinforcement Learning for Power System Control with Adversarial Training

Beyond Worst-case Attacks: Robust RL with Adaptive Defense via Non-dominated Policies

Robust Adversarial Reinforcement Learning via Bounded Rationality Curricula

Robust Proximal Adversarial Reinforcement Learning under Model Mismatch

Toward Evaluating Robustness of Reinforcement Learning with Adversarial Policy

Online Robustness Training for Deep Reinforcement Learning

Improved Robustness and Safety for Autonomous Vehicle Control with Adversarial Reinforcement Learning

RL-Based Method for Benchmarking the Adversarial Resilience and Robustness of Deep Reinforcement Learning Policies

Transferable Adversarial Attacks on Deep Reinforcement Learning with Domain Randomization

Adversarial Skill Learning for Robust Manipulation