Abstract:Abstract Intelligent agents and multi-agent systems are increasingly used in complex scenarios, such as controlling groups of drones and non-player characters in video games. In these applications, multi-agent navigation and obstacle avoidance are foundational functions. However, problems become more challenging with the increased complexity of the environment and the dynamic decision-making interactions among agents. The Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm is a classical multi-agent reinforcement learning algorithm successfully used to improve agents’ performance. However, it ignores the temporal message hidden in agents’ interaction with the environment and needs to be more efficient in scenarios with many agents due to its training technique. To address the limitations of MADDPG, we propose to explore modified algorithms of MADDPG for multi-agent navigation and obstacle avoidance. By combining MADDPG with Long Short-Term Memory (LSTM), we obtain the MADDPG-LSTMactor algorithm, which leverages continuous observations over time as input for the policy network, enabling the LSTM layer to capture hidden temporal patterns. Moreover, by simplifying the input of the critic network, we obtain the MADDPG-L algorithm for efficiency improvement in scenarios with many agents. Experimental results demonstrate that these algorithms outperform existing networks in the OpenAI multi-agent particle environment. We also conducted a comparative study of the LSTM-based approach with Transformer and self-attention models in the task of multi-agent navigation and obstacle avoidance. The results reveal that Transformer and self-attention do not consistently outperform LSTM. The LSTM-based model exhibits a favorable tradeoff across varying sequence lengths. Overall, this work addresses the limitations of MADDPG in multi-agent navigation and obstacle avoidance tasks, providing insights for developing intelligent agents and multi-agent systems.

Dueling Network Architecture for Multi-Agent Deep Deterministic Policy Gradient

The Design and Realization of Multi-agent Obstacle Avoidance based on Reinforcement Learning

Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms

Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

A Policy Gradient Algorithm to Alleviate the Multi-Agent Value Overestimation Problem in Complex Environments

Modelling the Dynamic Joint Policy of Teammates with Attention Multi-agent DDPG

Multi-Agent Cooperation Decision-Making by Reinforcement Learning with Encirclement Rewards

Multi-Agent Reinforcement Learning for Problems with Combined Individual and Team Reward

Modelling the Dynamic Joint Policy of Teammates with Attention Multi-agent DDPG

A TD3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment

A Collaborative Multiagent Reinforcement Learning Method Based on Policy Gradient Potential

Scalable and Sample Efficient Distributed Policy Gradient Algorithms in Multi-Agent Networked Systems

Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments

Research on Wargame Decision-Making Method Based on Multi-Agent Deep Deterministic Policy Gradient

Research on Multi-Agent Task Allocation and Path Planning Based on Pri-MADDPG

Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning

Deep reinforcement learning algorithm based on multi-agent parallelism and its application in game environment

Time-aware MADDPG with LSTM for multi-agent obstacle avoidance: a comparative study

MAGNet: Multi-agent Graph Network for Deep Multi-agent Reinforcement Learning

Preference-based experience sharing scheme for multi-agent reinforcement learning in multi-target environments

Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning