Toward Evaluating Robustness of Reinforcement Learning with Adversarial Policy

Xiang Zheng,Xingjun Ma,Shengjie Wang,Xinyu Wang,Chao Shen,Cong Wang

2024-04-26

Abstract:Reinforcement learning agents are susceptible to evasion attacks during deployment. In single-agent environments, these attacks can occur through imperceptible perturbations injected into the inputs of the victim policy network. In multi-agent environments, an attacker can manipulate an adversarial opponent to influence the victim policy's observations indirectly. While adversarial policies offer a promising technique to craft such attacks, current methods are either sample-inefficient due to poor exploration strategies or require extra surrogate model training under the black-box assumption. To address these challenges, in this paper, we propose Intrinsically Motivated Adversarial Policy (IMAP) for efficient black-box adversarial policy learning in both single- and multi-agent environments. We formulate four types of adversarial intrinsic regularizers -- maximizing the adversarial state coverage, policy coverage, risk, or divergence -- to discover potential vulnerabilities of the victim policy in a principled way. We also present a novel bias-reduction method to balance the extrinsic objective and the adversarial intrinsic regularizers adaptively. Our experiments validate the effectiveness of the four types of adversarial intrinsic regularizers and the bias-reduction method in enhancing black-box adversarial policy learning across a variety of environments. Our IMAP successfully evades two types of defense methods, adversarial training and robust regularizer, decreasing the performance of the state-of-the-art robust WocaR-PPO agents by 34\%-54\% across four single-agent tasks. IMAP also achieves a state-of-the-art attacking success rate of 83.91\% in the multi-agent game YouShallNotPass. Our code is available at \url{<a class="link-external link-https" href="https://github.com/x-zheng16/IMAP" rel="external noopener nofollow">this https URL</a>}.

Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in reinforcement learning, agents (i.e., the intelligent agents in reinforcement learning algorithms) are vulnerable to evasion attacks during the deployment phase. Specifically, in a single - agent environment, these attacks can be achieved by making imperceptible perturbations to the input of the victim's policy network; while in a multi - agent environment, an attacker can indirectly affect the victim's observations by manipulating an adversarial opponent. Current methods are either sample - inefficient due to poor exploration strategies or require additional agent model training under the black - box assumption. To address these challenges, this paper proposes the Intrinsically Motivated Adversarial Policy (IMAP), aiming to improve the efficiency of black - box adversarial policy learning in both single - agent and multi - agent environments. IMAP discovers potential vulnerabilities in the victim's policy by designing four types of adversarial intrinsic regularizers - maximizing adversarial state coverage, policy coverage, risk, or divergence. In addition, a new bias - reduction method is proposed to adaptively balance the external objective and the adversarial intrinsic regularizers, thereby validating the effectiveness of these four adversarial intrinsic regularizers and the effect of the bias - reduction method in multiple environments. In short, the core problem of the paper is to improve the robustness of reinforcement learning agents when facing adversarial attacks, especially in black - box attack scenarios, by improving the learning method of adversarial policies.

Toward Evaluating Robustness of Reinforcement Learning with Adversarial Policy

IMAP: Intrinsically Motivated Adversarial Policy

MARNet: Backdoor Attacks Against Cooperative Multi-Agent Reinforcement Learning

Adversarial Policies: Attacking Deep Reinforcement Learning

Curiosity-Driven and Victim-Aware Adversarial Policies.

Robust Multi-Agent Reinforcement Learning against Adversaries on Observation

SUB-PLAY: Adversarial Policies against Partially Observed Multi-Agent Reinforcement Learning Systems

Beyond Worst-case Attacks: Robust RL with Adaptive Defense via Non-dominated Policies

Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space

Dissecting Adversarial Robustness of Multimodal LM Agents

Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms

Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations

Attacking Cooperative Multi-Agent Reinforcement Learning by Adversarial Minority Influence

Attacking and Defending Deep Reinforcement Learning Policies

Robust Deep Reinforcement Learning with Adversarial Attacks

Robust Reinforcement Learning on State Observations with Learned Optimal Adversary

RLUC: Strengthening Robustness by Attaching Constraint Considerations to Policy Network

Attacking Deep Reinforcement Learning with Decoupled Adversarial Policy

Robust Adaptive Ensemble Adversary Reinforcement Learning

Understanding Adversarial Attacks on Observations in Deep Reinforcement Learning

Multi-adversarial Testing and Retraining for Improving the Robustness of Autonomous Driving Policies