Robust off-policy Reinforcement Learning via Soft Constrained Adversary

Kosuke Nakanishi,Akihiro Kubo,Yuji Yasui,Shin Ishii

2024-08-31

Abstract:Recently, robust reinforcement learning (RL) methods against input observation have garnered significant attention and undergone rapid evolution due to RL's potential vulnerability. Although these advanced methods have achieved reasonable success, there have been two limitations when considering adversary in terms of long-term horizons. First, the mutual dependency between the policy and its corresponding optimal adversary limits the development of off-policy RL algorithms; although obtaining optimal adversary should depend on the current policy, this has restricted applications to off-policy RL. Second, these methods generally assume perturbations based only on the $L_p$-norm, even when prior knowledge of the perturbation distribution in the environment is available. We here introduce another perspective on adversarial RL: an f-divergence constrained problem with the prior knowledge distribution. From this, we derive two typical attacks and their corresponding robust learning frameworks. The evaluation of robustness is conducted and the results demonstrate that our proposed methods achieve excellent performance in sample-efficient off-policy RL.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

The problems that this paper attempts to solve mainly focus on two aspects: 1. **Expanding the application of Reinforcement Learning (RL) algorithms in adversarial environments**: Existing robust reinforcement learning methods have achieved certain success in dealing with adversarial perturbations of input observations. However, most of these methods rely on on - policy algorithms, which limits their application in off - policy algorithms with higher sample efficiency. In addition, existing methods usually assume that the perturbation is based on the Lp - norm ball, which makes it difficult to consider the noise distributions (such as Gaussian noise) common in real - world environments. Therefore, the paper aims to expand the application range of robust DRL methods to the recently - innovated off - policy Actor - Critic algorithms by introducing an f - divergence - constrained method, and pay special attention to the vulnerabilities caused by Markov Decision Processes (MDPs). 2. **Introducing more realistic adversary models**: Traditional robust learning methods usually assume that the perturbation is limited by the L∞ - norm, which may not completely match the perturbation situations in real - world environments. For this reason, the paper proposes a new perspective to deal with the adversarial RL problem, that is, regarding the adversary's search as an f - divergence - constrained optimization problem based on the prior perturbation distribution. From this perspective, the paper derives two typical attack methods - Soft Worst - Case Attack (SofA) and ε - Worst - Case Attack (EpsA), and verifies the effectiveness of these methods through theoretical analysis and experiments. Specifically, the main contributions of the paper are: - Introducing an f - divergence - constrained method, expanding the application of robust DRL methods in off - policy algorithms. - Proposing more flexible and realistic adversary models that can better simulate the perturbation situations in real - world environments. Through these improvements, the paper aims to improve the robustness and performance of reinforcement learning algorithms when facing complex and uncertain environments.

Robust off-policy Reinforcement Learning via Soft Constrained Adversary

Robust Reinforcement Learning on State Observations with Learned Optimal Adversary

Efficient Adversarial Training without Attacking: Worst-Case-Aware Robust Reinforcement Learning

Beyond Worst-case Attacks: Robust RL with Adaptive Defense via Non-dominated Policies

Active Robust Adversarial Reinforcement Learning under Temporally-Coupled Perturbations

Robustifying Reinforcement Learning Agents via Action Space Adversarial Training

Robust Deep Reinforcement Learning with Adversarial Attacks

Adversary Agnostic Robust Deep Reinforcement Learning

Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations

Safe Reinforcement Learning with Dual Robustness

Robustifying Reinforcement Learning Policies with L1 Adaptive Control

Towards Robust Policy: Enhancing Offline Reinforcement Learning with Adversarial Attacks and Defenses

Robust Model-Based Reinforcement Learning with an Adversarial Auxiliary Model

LiRA: Light-Robust Adversary for Model-based Reinforcement Learning in Real World

Robust Safe Reinforcement Learning under Adversarial Disturbances

On Practical Robust Reinforcement Learning: Adjacent Uncertainty Set and Double-Agent Algorithm.

RLUC: Strengthening Robustness by Attaching Constraint Considerations to Policy Network

Robust Reinforcement Learning as a Stackelberg Game via Adaptively-Regularized Adversarial Training

Risk Averse Robust Adversarial Reinforcement Learning

Orthogonal Adversarial Deep Reinforcement Learning for Discrete- and Continuous-Action Problems

Robust Reinforcement Learning using Offline Data