Robust Deep Reinforcement Learning Through Adversarial Attacks and Training : A Survey

Lucas Schott,Josephine Delas,Hatem Hajri,Elies Gherbi,Reda Yaich,Nora Boulahia-Cuppens,Frederic Cuppens,Sylvain Lamprier
2024-03-01
Abstract:Deep Reinforcement Learning (DRL) is an approach for training autonomous agents across various complex environments. Despite its significant performance in well known environments, it remains susceptible to minor conditions variations, raising concerns about its reliability in real-world applications. To improve usability, DRL must demonstrate trustworthiness and robustness. A way to improve robustness of DRL to unknown changes in the conditions is through Adversarial Training, by training the agent against well suited adversarial attacks on the dynamics of the environment. Addressing this critical issue, our work presents an in-depth analysis of contemporary adversarial attack methodologies, systematically categorizing them and comparing their objectives and operational mechanisms. This classification offers a detailed insight into how adversarial attacks effectively act for evaluating the resilience of DRL agents, thereby paving the way for enhancing their robustness.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the robustness and reliability of deep reinforcement learning (DRL) in the face of environmental condition changes. Although DRL performs well in known environments, in practical applications, it is very sensitive to minor condition changes, which raises concerns about its reliability. In order to enhance the robustness of DRL under unknown condition changes, this paper explores methods to improve DRL through adversarial training, that is, by training agents under adversarial attacks to evaluate and improve their robustness. Specifically, this paper mainly focuses on the following aspects: 1. **Robustness issues**: - DRL agents perform well in simulated environments, but when transferred to real - world applications, their performance may decline, which is the so - called "reality gap". - Perturbations in the real world (such as sensor failures, differences in physical characteristics, etc.) may cause DRL agents to make wrong decisions or experience performance degradation. - Adversarial attacks can generate intentionally designed small perturbations, which are aimed at misleading neural network decisions, thereby revealing the vulnerability of DRL agents. 2. **Adversarial training**: - Through adversarial training, adversarial samples can be introduced in the training stage, so that agents can learn to deal with various possible perturbations, thereby improving their robustness during deployment. - The goal of adversarial training is to enable agents to maintain good performance in the face of unknown condition changes. 3. **Classification and comparison**: - The paper systematically classifies and compares existing adversarial attack methods and divides them into two categories: observation alterations and dynamics alterations. - Observation alterations refer to changing the observation values received by agents, while dynamics alterations refer to changing the dynamic characteristics of the environment, such as the state - transition function. 4. **Contributions**: - Formally defines the concept of robustness in DRL. - Proposes a new classification system, organizing all types of perturbations into a unified model. - Reviews and classifies the adversarial attack methods in the existing literature. - Explores how to use adversarial attacks to improve the robustness of DRL agents. Through these studies, the paper aims to provide a comprehensive framework for understanding and improving the robustness of DRL agents, making them more reliable and trustworthy in real - world applications. ### Formula summary - **Cumulative reward**: \[ R(\tau)=\sum_{t = 0}^{|\tau|}\gamma^tR(s_t,a_t,s_{t + 1}) \] - **Optimal policy**: \[ \pi^*=\arg\max_{\pi}E_{\tau\sim\pi_{\Omega}}[R(\tau)] \] - **Value function**: \[ V^{\pi}(s)=E_{\tau\sim\pi_{\Omega}}[R(\tau)|s_0 = s] \] - **Q - value function**: \[ Q^{\pi}(s,a)=E_{\tau\sim\pi_{\Omega}}[R(\tau)|s_0 = s,a_0 = a] \] - **Adversarial sample generation**: \[ \min_{x'}\|x - x'\|\quad\text{s.t.}\quad f_{\theta}(x)\neq f_{\theta}(x') \] - **Robust optimization problem**: \[ \pi^*=\arg\max_{\pi}\min_{\tilde{\Phi}\in R}E_{\phi\sim\tilde{\Phi}(\phi|\pi)}E_{\tau\sim\pi_{\phi,\Omega}}[R(\tau)] \] These formulas show D