Abstract:Intelligence agents and multi-agent systems play important roles in scenes like the control system of grouped drones, and multi-agent navigation and obstacle avoidance which is the foundational function of advanced application has great importance. In multi-agent navigation and obstacle avoidance tasks, the decision-making interactions and dynamic changes of agents are difficult for traditional route planning algorithms or reinforcement learning algorithms with the increased complexity of the environment. The classical multi-agent reinforcement learning algorithm, Multi-agent deep deterministic policy gradient(MADDPG), solved precedent algorithms' problems of having unstationary training process and unable to deal with environment randomness. However, MADDPG ignored the temporal message hidden beneath agents' interaction with the environment. Besides, due to its CTDE technique which let each agent's critic network to calculate over all agents' action and the whole environment information, it lacks ability to scale to larger amount of agents. To deal with MADDPG's ignorance of the temporal information of the data, this article proposes a new algorithm called MADDPG-LSTMactor, which combines MADDPG with Long short term memory (LSTM). By using agent's observations of continuous timesteps as the input of its policy network, it allows the LSTM layer to process the hidden temporal message. Experimental result demonstrated that this algorithm had better performance in scenarios where the amount of agents is small. Besides, to solve MADDPG's drawback of not being efficient in scenarios where agents are too many, this article puts forward a light-weight MADDPG (MADDPG-L) algorithm, which simplifies the input of critic network. The result of experiments showed that this algorithm had better performance than MADDPG when the amount of agents was large.

A Policy Gradient Algorithm to Alleviate the Multi-Agent Value Overestimation Problem in Complex Environments

Large Scale Pursuit-Evasion under Collision Avoidance Using Deep Reinforcement Learning.

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

A TD3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment

The Design and Realization of Multi-agent Obstacle Avoidance based on Reinforcement Learning

A Collaborative Multiagent Reinforcement Learning Method Based on Policy Gradient Potential

Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

Multi-UAV pursuit-evasion gaming based on PSO-M3DDPG schemes

An off-policy multi-agent stochastic policy gradient algorithm for cooperative continuous control

Expert System-Based Multiagent Deep Deterministic Policy Gradient for Swarm Robot Decision Making

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Research on Wargame Decision-Making Method Based on Multi-Agent Deep Deterministic Policy Gradient

Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization

Multi-Agent Reinforcement Learning for Problems with Combined Individual and Team Reward

Preference-based experience sharing scheme for multi-agent reinforcement learning in multi-target environments

Off-Policy Multi-Agent Decomposed Policy Gradients

Optimistic Multi-Agent Policy Gradient

Scalable and Sample Efficient Distributed Policy Gradient Algorithms in Multi-Agent Networked Systems

PP-PG: Combining Parameter Perturbation with Policy Gradient Methods for Effective and Efficient Explorations in Deep Reinforcement Learning

Path Planning in Complex Environments Using Attention-Based Deep Deterministic Policy Gradient

Multi-Agent Cooperation Decision-Making by Reinforcement Learning with Encirclement Rewards