Abstract:Intelligence agents and multi-agent systems play important roles in scenes like the control system of grouped drones, and multi-agent navigation and obstacle avoidance which is the foundational function of advanced application has great importance. In multi-agent navigation and obstacle avoidance tasks, the decision-making interactions and dynamic changes of agents are difficult for traditional route planning algorithms or reinforcement learning algorithms with the increased complexity of the environment. The classical multi-agent reinforcement learning algorithm, Multi-agent deep deterministic policy gradient(MADDPG), solved precedent algorithms' problems of having unstationary training process and unable to deal with environment randomness. However, MADDPG ignored the temporal message hidden beneath agents' interaction with the environment. Besides, due to its CTDE technique which let each agent's critic network to calculate over all agents' action and the whole environment information, it lacks ability to scale to larger amount of agents. To deal with MADDPG's ignorance of the temporal information of the data, this article proposes a new algorithm called MADDPG-LSTMactor, which combines MADDPG with Long short term memory (LSTM). By using agent's observations of continuous timesteps as the input of its policy network, it allows the LSTM layer to process the hidden temporal message. Experimental result demonstrated that this algorithm had better performance in scenarios where the amount of agents is small. Besides, to solve MADDPG's drawback of not being efficient in scenarios where agents are too many, this article puts forward a light-weight MADDPG (MADDPG-L) algorithm, which simplifies the input of critic network. The result of experiments showed that this algorithm had better performance than MADDPG when the amount of agents was large.

Hybrid Attention-Oriented Experience Replay for Deep Reinforcement Learning and Its Application to a Multi-Robot Cooperative Hunting Problem.

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

Mapless Collaborative Navigation for a Multi-Robot System Based on the Deep Reinforcement Learning

Research on multi-UAV task decision-making based on improved MADDPG algorithm and transfer learning

An Improved Approach Towards Multi-Agent Pursuit–Evasion Game Decision-Making Using Deep Reinforcement Learning

Decomposed and Prioritized Experience Replay-based MADDPG Algorithm for Multi-UAV Confrontation

A Dynamically Adaptive Approach to Reducing Strategic Interference for Multi-agent Systems

Prioritized Experience Replay–Based Path Planning Algorithm for Multiple UAVs

Leveraging Efficiency Through Hybrid Prioritized Experience Replay in Door Environment.

A Multi-agent Deep Reinforcement Learning Method for UAVs Cooperative Pursuit Problem

Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay

A Many-to-Many UAV Pursuit and Interception Strategy Based on PERMADDPG

Hindsight-aware Deep Reinforcement Learning Algorithm for Multi-Agent Systems

Underwater Multi-agent Cooperative Formation Hunting Based on Deep Reinforcement Learning

Multi-robot Cooperative Pursuit via Potential Field-Enhanced Reinforcement Learning

An Investigation on Multi-UAVs Cooperative Control Algorithm for Target Chasing

The Design and Realization of Multi-agent Obstacle Avoidance based on Reinforcement Learning

Cooperative Hunting Method of Unmanned Surface Vehicle based on Attention Mechanism

Prioritized Experience Replay for Multi-agent Cooperation

A Deep Reinforcement Learning-Based Method Applied for Solving Multi-Agent Defense and Attack Problems.

Cooperative Multi-Target Hunting by Unmanned Surface Vehicles Based on Multi-Agent Reinforcement Learning