Abstract:Intelligence agents and multi-agent systems play important roles in scenes like the control system of grouped drones, and multi-agent navigation and obstacle avoidance which is the foundational function of advanced application has great importance. In multi-agent navigation and obstacle avoidance tasks, the decision-making interactions and dynamic changes of agents are difficult for traditional route planning algorithms or reinforcement learning algorithms with the increased complexity of the environment. The classical multi-agent reinforcement learning algorithm, Multi-agent deep deterministic policy gradient(MADDPG), solved precedent algorithms' problems of having unstationary training process and unable to deal with environment randomness. However, MADDPG ignored the temporal message hidden beneath agents' interaction with the environment. Besides, due to its CTDE technique which let each agent's critic network to calculate over all agents' action and the whole environment information, it lacks ability to scale to larger amount of agents. To deal with MADDPG's ignorance of the temporal information of the data, this article proposes a new algorithm called MADDPG-LSTMactor, which combines MADDPG with Long short term memory (LSTM). By using agent's observations of continuous timesteps as the input of its policy network, it allows the LSTM layer to process the hidden temporal message. Experimental result demonstrated that this algorithm had better performance in scenarios where the amount of agents is small. Besides, to solve MADDPG's drawback of not being efficient in scenarios where agents are too many, this article puts forward a light-weight MADDPG (MADDPG-L) algorithm, which simplifies the input of critic network. The result of experiments showed that this algorithm had better performance than MADDPG when the amount of agents was large.

Research on Generalization of Multi-agent Based on Reinforcement Learning

The Multi-Agent System Based on Reinforcement Learning

Dueling Network Architecture for Multi-Agent Deep Deterministic Policy Gradient

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Twin Delayed Multi-Agent Deep Deterministic Policy Gradient

Deep reinforcement learning algorithm based on multi-agent parallelism and its application in game environment

Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay

Special Agents Policy Gradient In Value Decomposition-based Approach

Enabling Inter-Agent Transfer for Multi-Agent Learning System by Incorporating Role Reversal

A Semi-Independent Policies Training Method with Shared Representation for Heterogeneous Multi-Agents Reinforcement Learning.

A new multi-agent reinforcement learning approach

Generalized Multi-Agent Competitive Reinforcement Learning with Differential Augmentation

Learning Meta Representations for Agents in Multi-Agent Reinforcement Learning

The Design and Realization of Multi-agent Obstacle Avoidance based on Reinforcement Learning

A Two-Layered Multi-Agent Reinforcement Learning Model and Algorithm

Intention Propagation for Multi-agent Reinforcement Learning

Quantifying the effects of environment and population diversity in multi-agent reinforcement learning

LMRL: a Multi-Agent Reinforcement Learning Model and Algorithm

Scalable and Transferable Reinforcement Learning for Multi-Agent Mixed Cooperative–Competitive Environments Based on Hierarchical Graph Attention

Multi-agent Reinforcement Learning Algorithm Based on Local Information

Reinforcement Learning Model and Algorithm Based on Multi-agent Cooperation