Abstract:Abstract Intelligent agents and multi-agent systems are increasingly used in complex scenarios, such as controlling groups of drones and non-player characters in video games. In these applications, multi-agent navigation and obstacle avoidance are foundational functions. However, problems become more challenging with the increased complexity of the environment and the dynamic decision-making interactions among agents. The Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm is a classical multi-agent reinforcement learning algorithm successfully used to improve agents’ performance. However, it ignores the temporal message hidden in agents’ interaction with the environment and needs to be more efficient in scenarios with many agents due to its training technique. To address the limitations of MADDPG, we propose to explore modified algorithms of MADDPG for multi-agent navigation and obstacle avoidance. By combining MADDPG with Long Short-Term Memory (LSTM), we obtain the MADDPG-LSTMactor algorithm, which leverages continuous observations over time as input for the policy network, enabling the LSTM layer to capture hidden temporal patterns. Moreover, by simplifying the input of the critic network, we obtain the MADDPG-L algorithm for efficiency improvement in scenarios with many agents. Experimental results demonstrate that these algorithms outperform existing networks in the OpenAI multi-agent particle environment. We also conducted a comparative study of the LSTM-based approach with Transformer and self-attention models in the task of multi-agent navigation and obstacle avoidance. The results reveal that Transformer and self-attention do not consistently outperform LSTM. The LSTM-based model exhibits a favorable tradeoff across varying sequence lengths. Overall, this work addresses the limitations of MADDPG in multi-agent navigation and obstacle avoidance tasks, providing insights for developing intelligent agents and multi-agent systems.

What problem does this paper attempt to address?

The paper primarily focuses on addressing the navigation and obstacle avoidance problems in multi-agent systems and proposes improvements to the existing Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm, which has certain limitations. Specifically, the paper points out that when performing multi-agent navigation and obstacle avoidance in complex environments, the traditional MADDPG algorithm overlooks the temporal sequence information hidden in the interactions between agents and the environment. Additionally, as the number of agents increases, the training efficiency decreases. To solve these issues, the authors propose two improved algorithms: 1. **MADDPG-LSTMactor**: This combines the Long Short-Term Memory (LSTM) network with MADDPG, utilizing LSTM to process continuous time series observation data, thereby better capturing hidden temporal patterns and improving decision quality. Experimental results show that this method outperforms the baseline algorithm in multi-agent particle environments. 2. **MADDPG-L**: This simplifies the input of the critic network by considering only the current agent's actions rather than the actions of all agents, thus improving the algorithm's efficiency in scenarios with a large number of agents. Additionally, the paper compares the performance of LSTM-based methods with Transformer and self-attention mechanisms in multi-agent navigation and obstacle avoidance tasks. The results indicate that LSTM exhibits a good trade-off effect for different sequence lengths; while Transformer shows improved performance on long sequence data, it still cannot surpass LSTM's performance on short sequence data; the self-attention mechanism is more suitable for handling shorter data sequences but its performance rapidly declines on long sequence data. In summary, this research aims to improve multi-agent reinforcement learning algorithms by incorporating temporal sequence features and explores the applicability and performance of these algorithms in multi-agent systems of different scales.

Time-aware MADDPG with LSTM for multi-agent obstacle avoidance: a comparative study

The Design and Realization of Multi-agent Obstacle Avoidance based on Reinforcement Learning

Mapless Collaborative Navigation for a Multi-Robot System Based on the Deep Reinforcement Learning

MAPPO method based on attention behavior network

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

Graph MADDPG with RNN for multiagent cooperative environment

DTPPO: Dual-Transformer Encoder-based Proximal Policy Optimization for Multi-UAV Navigation in Unseen Complex Environments

Multi-Agent Path Planning based on MPC and DDPG

Modelling the Dynamic Joint Policy of Teammates with Attention Multi-agent DDPG

Multi-Uav Automatic Dynamic Obstacle Avoidance With Experience-Shared A2c

A path planning algorithm fusion of obstacle avoidance and memory functions

R-MADDPG for Partially Observable Environments and Limited Communication

Novel task decomposed multi-agent twin delayed deep deterministic policy gradient algorithm for multi-UAV autonomous path planning

A Policy Gradient Algorithm to Alleviate the Multi-Agent Value Overestimation Problem in Complex Environments

Research on Multi-Agent Task Allocation and Path Planning Based on Pri-MADDPG

Research on Collision-free Control and Simulation of Single-Agent Based on An Improved DDPG Algorithm

Multi-Agent Reinforcement Learning for Problems with Combined Individual and Team Reward

Cooperative Control of Multiple AGVs Based on Multi-Agent Reinforcement Learning

ME‐MADDPG: An efficient learning‐based motion planning method for multiple agents in complex environments

Modelling the Dynamic Joint Policy of Teammates with Attention Multi-agent DDPG

Expert System-Based Multiagent Deep Deterministic Policy Gradient for Swarm Robot Decision Making