Robust off-policy Reinforcement Learning via Soft Constrained Adversary

Kosuke Nakanishi,Akihiro Kubo,Yuji Yasui,Shin Ishii
2024-08-31
Abstract:Recently, robust reinforcement learning (RL) methods against input observation have garnered significant attention and undergone rapid evolution due to RL's potential vulnerability. Although these advanced methods have achieved reasonable success, there have been two limitations when considering adversary in terms of long-term horizons. First, the mutual dependency between the policy and its corresponding optimal adversary limits the development of off-policy RL algorithms; although obtaining optimal adversary should depend on the current policy, this has restricted applications to off-policy RL. Second, these methods generally assume perturbations based only on the $L_p$-norm, even when prior knowledge of the perturbation distribution in the environment is available. We here introduce another perspective on adversarial RL: an f-divergence constrained problem with the prior knowledge distribution. From this, we derive two typical attacks and their corresponding robust learning frameworks. The evaluation of robustness is conducted and the results demonstrate that our proposed methods achieve excellent performance in sample-efficient off-policy RL.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on two aspects: 1. **Expanding the application of Reinforcement Learning (RL) algorithms in adversarial environments**: Existing robust reinforcement learning methods have achieved certain success in dealing with adversarial perturbations of input observations. However, most of these methods rely on on - policy algorithms, which limits their application in off - policy algorithms with higher sample efficiency. In addition, existing methods usually assume that the perturbation is based on the Lp - norm ball, which makes it difficult to consider the noise distributions (such as Gaussian noise) common in real - world environments. Therefore, the paper aims to expand the application range of robust DRL methods to the recently - innovated off - policy Actor - Critic algorithms by introducing an f - divergence - constrained method, and pay special attention to the vulnerabilities caused by Markov Decision Processes (MDPs). 2. **Introducing more realistic adversary models**: Traditional robust learning methods usually assume that the perturbation is limited by the L∞ - norm, which may not completely match the perturbation situations in real - world environments. For this reason, the paper proposes a new perspective to deal with the adversarial RL problem, that is, regarding the adversary's search as an f - divergence - constrained optimization problem based on the prior perturbation distribution. From this perspective, the paper derives two typical attack methods - Soft Worst - Case Attack (SofA) and ε - Worst - Case Attack (EpsA), and verifies the effectiveness of these methods through theoretical analysis and experiments. Specifically, the main contributions of the paper are: - Introducing an f - divergence - constrained method, expanding the application of robust DRL methods in off - policy algorithms. - Proposing more flexible and realistic adversary models that can better simulate the perturbation situations in real - world environments. Through these improvements, the paper aims to improve the robustness and performance of reinforcement learning algorithms when facing complex and uncertain environments.