Abstract:Deep reinforcement learning (DRL) algorithms interact with the environment and aim to learn without labeled data. In high-dimensional spaces, they evolve their policies to maximize the rewards they can collect. They have applications in various fields, such as search and rescue, reconnaissance, military operations, firefighting, and autonomous vehicles. However, there are also situations in which algorithms struggle to cope. In simulation environments, it is assumed that the exact values of the observation data are properly received. If a neural network model meets inputs that are different from those used during training, accurate predictions cannot be made to solve these new situations. This makes it vulnerable to corrupted state data which may be encountered in real-world applications. In this study, State Adversarial Markov Decision Process (SA-MDP) was investigated to increase robustness. The state perturbed adversarial attack model is integrated into the DRL algorithm. To make appropriate decisions under perturbation, the guide actor, which is used only in the training phase and makes decisions with healthy observation data, guides the control actor, which makes decisions based on the perturbation model outputs. The proposed algorithm was applied to the target encirclement task for 3, 5 and 7 agents in multi-agent simulation systems prepared using the Pyglet library. The proposed guided approach was applied to both multi-agent soft actor critic (MA-SAC) and multi-agent twin delayed deep deterministic policy gradient (MA-TD3) algorithms. The results show that our approach is close to the results of the MA-SAC and MA-TD3 algorithms trained in noise-free environments.

Adaptive Deep Reinforcement Learning for Non-Stationary Environments

Deep Reinforcement Learning in Nonstationary Environments With Unknown Change Points

Exploring the Vulnerability of Deep Reinforcement Learning-based Emergency Control for Low Carbon Power Systems

A Behavior-Aware Approach for Deep Reinforcement Learning in Non-stationary Environments without Known Change Points

Adaptive Disassembly Sequence Planning for VR Maintenance Training Via Deep Reinforcement Learning

An intelligent generating method for multi-target attacking strategy based on environment-aware deep reinforcement learning

Dynamic Weight Adjusting Deep Q-Networks for Real-Time Environmental Adaptation

Reinforcement learning algorithm for non-stationary environments

Multi-Agent Guided Deep Reinforcement Learning Approach Against State Perturbed Adversarial Attacks

Context-Aware Safe Reinforcement Learning for Non-Stationary Environments

DQN with model-based exploration: efficient learning on environments with sparse rewards

Dynamic Path Planning of Unknown Environment Based on Deep Reinforcement Learning

Path Planning of Autonomous Mobile Robot in Comprehensive Unknown Environment Using Deep Reinforcement Learning

Adversarial Deep Reinforcement Learning for Cyber Security in Software Defined Networks

Deep Q-Learning Based Reinforcement Learning Approach for Network Intrusion Detection

Deep Reinforcement Learning for Cyber System Defense under Dynamic Adversarial Uncertainties

Residual Physics and Post-Posed Shielding for Safe Deep Reinforcement Learning Method

Deep Q-Network for Stochastic Process Environments

An Adaptive Deep RL Method for Non-Stationary Environments with Piecewise Stable Context

A deep reinforcement learning method for multi-stage equipment development planning in uncertain environments

Safe Decision Controller for Autonomous DrivingBased on Deep Reinforcement Learning inNondeterministic Environment