Abstract:Detecting malicious attacks presents a major challenge in the field of reinforcement learning (RL), as such attacks can force the victim to perform abnormal actions, with potentially severe consequences. To mitigate these risks, current research focuses on the enhancement of RL algorithms with efficient detection mechanisms, especially for real-world applications. Adversarial attacks have the potential to alter the environmental dynamics of a Markov Decision Process (MDP) perceived by an RL agent. Leveraging these changes in dynamics, we propose a novel approach to detect attacks. Our contribution can be summarized in two main aspects. Firstly, we propose a novel formalization of the attack detection problem that entails analyzing modifications made by attacks to the transition and reward dynamics within the environment. This problem can be framed as a context change detection problem, where the goal is to identify the transition from a "free-of-attack" situation to an "under-attack" scenario. To solve this problem, we propose a groundbreaking "model-free" clustering-based countermeasure. This approach consists of two essential steps: first, partitioning the transition space into clusters, and then using this partitioning to identify changes in environmental dynamics caused by adversarial attacks. To assess the efficiency of our detection method, we performed experiments on four established RL domains (grid-world, mountain car, carpole, and acrobot) and subjected them to four advanced attack types. Uniform, Strategically-timed, Q-value, and Multi-objective. Our study proves that our technique has a high potential for perturbation detection, even in scenarios where attackers employ more sophisticated strategies.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the challenge of malicious attack detection in Reinforcement Learning (RL). Specifically, these attacks can force the victim to perform abnormal behaviors, potentially leading to severe consequences. To mitigate these risks, current research focuses on enhancing RL algorithms with efficient detection mechanisms, especially in practical applications. ### Background and Motivation In RL, malicious attackers can manipulate the observation data in the environment, misleading the RL agent during the testing phase. These attacks typically involve small but intentional perturbations to the environment dynamics, causing changes in the agent's behavior that may lead to catastrophic outcomes. For example, in an autonomous driving scenario, an attacker could cause the vehicle to deviate from its normal path into oncoming traffic. ### Solution To address this issue, the authors propose a clustering-based attack detection method. The main contributions of this method include two aspects: 1. **Problem Formalization**: Formalizing the attack detection problem as analyzing the modifications to the environment's transition and reward dynamics caused by attacks. This problem can be framed as a context change detection problem, aiming to identify the transition from a "no attack" state to an "under attack" state. 2. **Clustering Method**: Proposing a "model-free" clustering method to detect attacks. This method includes two main steps: - **Partitioning**: Dividing the transition space into multiple clusters. - **Detection**: Using these partitions to identify changes in the environment dynamics caused by adversarial attacks. ### Experimental Validation To evaluate the effectiveness of the proposed detection method, the authors conducted experiments in four classic RL domains (Grid World, Mountain Car, Inverted Pendulum, and Robotic Arm) and applied four advanced attack types (Uniform Attack, Strategic Attack, Q-value Attack, and Multi-objective Attack) to these domains. The experimental results demonstrate that this technique has high potential in detecting perturbations, even when attackers employ more sophisticated strategies. ### Conclusion Through the aforementioned method, this paper provides a new perspective on solving the problem of adversarial attack detection in RL. This method not only effectively detects changes in environment dynamics but also timely identifies attacks in practical applications, thereby preventing dangerous situations or performance degradation.

Clustering-based attack detection for adversarial reinforcement learning

MARNet: Backdoor Attacks Against Cooperative Multi-Agent Reinforcement Learning

Optimal Attack and Defense for Reinforcement Learning

Real-time Adversarial Perturbations against Deep Reinforcement Learning Policies: Attacks and Defenses

Reinforcement Learning-based Adversarial Attacks on Object Detectors using Reward Shaping

Towards Secure Multi-Agent Deep Reinforcement Learning: Adversarial Attacks and Countermeasures

Characterizing Attacks on Deep Reinforcement Learning

BadRL: Sparse Targeted Backdoor Attack Against Reinforcement Learning

Adversarial robustness of deep reinforcement learning-based intrusion detection

Robust Deep Reinforcement Learning with Adversarial Attacks

Multiple-Model Based Defense for Deep Reinforcement Learning Against Adversarial Attack

Transferable Adversarial Attacks on Deep Reinforcement Learning with Domain Randomization

Adversarial Attacks on Reinforcement Learning Agents for Command and Control

Understanding Adversarial Attacks on Observations in Deep Reinforcement Learning

Robustifying Reinforcement Learning Agents via Action Space Adversarial Training

SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents

Improving Robustness of Reinforcement Learning for Power System Control with Adversarial Training

Deep-Attack over the Deep Reinforcement Learning

Camouflage Adversarial Attacks on Multiple Agent Systems

Adversarial Policies: Attacking Deep Reinforcement Learning

Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks