RL-VAEGAN: Adversarial Defense for Reinforcement Learning Agents Via Style Transfer.

Yueyue Hu,Shiliang Sun
DOI: https://doi.org/10.1016/j.knosys.2021.106967
IF: 8.139
2021-01-01
Knowledge-Based Systems
Abstract:Reinforcement learning (RL) agents parameterized by deep neural networks have achieved great success in many domains. However, deep RL policies have been shown to be vulnerable to adversarial attacks, i.e., inputs with slight perturbations should result in a substantial agent failure. Inspired by recent advances in deep generative networks that have greatly facilitated the development of adversarial attacks, in this paper, we investigate the adversarial robustness of RL agents and propose a novel defense framework for RL based on the idea of style transfer. More precisely, our defense framework containing variational autoencoders (VAEs) and generative adversarial networks (GANs), called RL-VAEGAN, learns the distribution of the styles of the original and adversarial states, respectively, and naturally eliminates the threat of adversarial attacks for RL agents by transferring adversarial states to unperturbed legitimate one under the shared-content latent space assumption. We empirically show that our methods are effective against the state-of-the-art methods in white-box and black-box scenarios with diverse magnitudes of perturbations.
What problem does this paper attempt to address?