Abstract:Intelligence agents and multi-agent systems play important roles in scenes like the control system of grouped drones, and multi-agent navigation and obstacle avoidance which is the foundational function of advanced application has great importance. In multi-agent navigation and obstacle avoidance tasks, the decision-making interactions and dynamic changes of agents are difficult for traditional route planning algorithms or reinforcement learning algorithms with the increased complexity of the environment. The classical multi-agent reinforcement learning algorithm, Multi-agent deep deterministic policy gradient(MADDPG), solved precedent algorithms' problems of having unstationary training process and unable to deal with environment randomness. However, MADDPG ignored the temporal message hidden beneath agents' interaction with the environment. Besides, due to its CTDE technique which let each agent's critic network to calculate over all agents' action and the whole environment information, it lacks ability to scale to larger amount of agents. To deal with MADDPG's ignorance of the temporal information of the data, this article proposes a new algorithm called MADDPG-LSTMactor, which combines MADDPG with Long short term memory (LSTM). By using agent's observations of continuous timesteps as the input of its policy network, it allows the LSTM layer to process the hidden temporal message. Experimental result demonstrated that this algorithm had better performance in scenarios where the amount of agents is small. Besides, to solve MADDPG's drawback of not being efficient in scenarios where agents are too many, this article puts forward a light-weight MADDPG (MADDPG-L) algorithm, which simplifies the input of critic network. The result of experiments showed that this algorithm had better performance than MADDPG when the amount of agents was large.

Reinforcement learning for multi-agent formation navigation with scalability

Learning to Cooperate: Application of Deep Reinforcement Learning for Online AGV Path Finding.

Moving Forward in Formation: A Decentralized Hierarchical Learning Approach to Multi-Agent Moving Together

Mapless Collaborative Navigation for a Multi-Robot System Based on the Deep Reinforcement Learning

Multi-Robot Learning Dynamic Obstacle Avoidance in Formation with Information-Directed Exploration.

Relative Distributed Formation and Obstacle Avoidance with Multi-agent Reinforcement Learning

Obstacle Avoidance in Multi-Agent Formation Process Based on Deep Reinforcement Learning

Hierarchical and Stable Multiagent Reinforcement Learning for Cooperative Navigation Control

Underwater Multi-agent Cooperative Formation Hunting Based on Deep Reinforcement Learning

The Design and Realization of Multi-agent Obstacle Avoidance based on Reinforcement Learning

Flexible Formation Control Using Hausdorff Distance: A Multi-agent Reinforcement Learning Approach.

Sim-real joint experimental verification for an unmanned surface vehicle formation strategy based on multi-agent deterministic policy gradient and line of sight guidance

Adaptive Leader-Follower Formation Control and Obstacle Avoidance via Deep Reinforcement Learning

Learning-Based Multi-Robot Formation Control With Obstacle Avoidance

The crowd cooperation approach for formation maintenance and collision avoidance using multi-agent deep reinforcement learning

Safe Multi-Agent Reinforcement Learning for Behavior-Based Cooperative Navigation

Oracle-Guided Deep Reinforcement Learning for Large-Scale Multi-UAVs Flocking and Navigation.

Multi-UAV Behavior-based Formation with Static and Dynamic Obstacles Avoidance via Reinforcement Learning

Multiple Ships Cooperative Navigation and Collision Avoidance using Multi-agent Reinforcement Learning with Communication

Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

Multi-Robot Collaborative Navigation with Formation Adaptation