Time-aware MADDPG with LSTM for multi-agent obstacle avoidance: a comparative study

Enyu Zhao,Ning Zhou,Chanjuan Liu,Houfu Su,Yang Liu,Jinmiao Cong
DOI: https://doi.org/10.1007/s40747-024-01389-0
IF: 6.7
2024-03-02
Complex & Intelligent Systems
Abstract:Abstract Intelligent agents and multi-agent systems are increasingly used in complex scenarios, such as controlling groups of drones and non-player characters in video games. In these applications, multi-agent navigation and obstacle avoidance are foundational functions. However, problems become more challenging with the increased complexity of the environment and the dynamic decision-making interactions among agents. The Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm is a classical multi-agent reinforcement learning algorithm successfully used to improve agents’ performance. However, it ignores the temporal message hidden in agents’ interaction with the environment and needs to be more efficient in scenarios with many agents due to its training technique. To address the limitations of MADDPG, we propose to explore modified algorithms of MADDPG for multi-agent navigation and obstacle avoidance. By combining MADDPG with Long Short-Term Memory (LSTM), we obtain the MADDPG-LSTMactor algorithm, which leverages continuous observations over time as input for the policy network, enabling the LSTM layer to capture hidden temporal patterns. Moreover, by simplifying the input of the critic network, we obtain the MADDPG-L algorithm for efficiency improvement in scenarios with many agents. Experimental results demonstrate that these algorithms outperform existing networks in the OpenAI multi-agent particle environment. We also conducted a comparative study of the LSTM-based approach with Transformer and self-attention models in the task of multi-agent navigation and obstacle avoidance. The results reveal that Transformer and self-attention do not consistently outperform LSTM. The LSTM-based model exhibits a favorable tradeoff across varying sequence lengths. Overall, this work addresses the limitations of MADDPG in multi-agent navigation and obstacle avoidance tasks, providing insights for developing intelligent agents and multi-agent systems.
computer science, artificial intelligence
What problem does this paper attempt to address?
The paper primarily focuses on addressing the navigation and obstacle avoidance problems in multi-agent systems and proposes improvements to the existing Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm, which has certain limitations. Specifically, the paper points out that when performing multi-agent navigation and obstacle avoidance in complex environments, the traditional MADDPG algorithm overlooks the temporal sequence information hidden in the interactions between agents and the environment. Additionally, as the number of agents increases, the training efficiency decreases. To solve these issues, the authors propose two improved algorithms: 1. **MADDPG-LSTMactor**: This combines the Long Short-Term Memory (LSTM) network with MADDPG, utilizing LSTM to process continuous time series observation data, thereby better capturing hidden temporal patterns and improving decision quality. Experimental results show that this method outperforms the baseline algorithm in multi-agent particle environments. 2. **MADDPG-L**: This simplifies the input of the critic network by considering only the current agent's actions rather than the actions of all agents, thus improving the algorithm's efficiency in scenarios with a large number of agents. Additionally, the paper compares the performance of LSTM-based methods with Transformer and self-attention mechanisms in multi-agent navigation and obstacle avoidance tasks. The results indicate that LSTM exhibits a good trade-off effect for different sequence lengths; while Transformer shows improved performance on long sequence data, it still cannot surpass LSTM's performance on short sequence data; the self-attention mechanism is more suitable for handling shorter data sequences but its performance rapidly declines on long sequence data. In summary, this research aims to improve multi-agent reinforcement learning algorithms by incorporating temporal sequence features and explores the applicability and performance of these algorithms in multi-agent systems of different scales.