Multiple Ships Cooperative Navigation and Collision Avoidance using Multi-agent Reinforcement Learning with Communication

Y. Wang,Y. Zhao
2024-10-12
Abstract:In the real world, unmanned surface vehicles (USV) often need to coordinate with each other to accomplish specific tasks. However, achieving cooperative control in multi-agent systems is challenging due to issues such as non-stationarity and partial observability. Recent advancements in Multi-Agent Reinforcement Learning (MARL) provide new perspectives to address these challenges. Therefore, we propose using the multi-agent deep deterministic policy gradient (MADDPG) algorithm with communication to address multiple ships' cooperation problems under partial observability. We developed two tasks based on OpenAI's gym environment: cooperative navigation and cooperative collision avoidance. In these tasks, ships must not only learn effective control strategies but also establish communication protocols with other agents. We analyze the impact of external noise on communication, the effect of inter-agent communication on performance, and the communication patterns learned by the agents. The results demonstrate that our proposed framework effectively addresses cooperative navigation and collision avoidance among multiple vessels, significantly outperforming traditional single-agent algorithms. Agents establish a consistent communication protocol, enabling them to compensate for missing information through shared observations and achieve better coordination.
Robotics,Systems and Control
What problem does this paper attempt to address?
This paper attempts to solve the problem of multi - unmanned surface vessels (USV) performing collaborative navigation and collision avoidance in partially observable marine environments. Specifically, the paper mainly focuses on the following aspects: 1. **Non - stationarity and partial observability in multi - agent systems**: - In the actual marine environment, due to the limitations of sensors and the influence of environmental conditions, ships cannot obtain perfect global observation information. In addition, the interaction between multiple ships makes the environment dynamically changing, that is, non - stationary, for a single ship. - To address these problems, the paper introduces multi - agent reinforcement learning (MARL), especially the multi - agent deep deterministic policy gradient (MADDPG) algorithm. 2. **Establishment and optimization of communication protocols**: - The paper studies how to enhance the coordination ability between ships through explicit message passing and explores the impact of communication noise on performance. The design of communication protocols needs to consider not only the effective transmission of information but also issues such as bandwidth limitations and noise interference. 3. **Path planning and collision avoidance strategies**: - The paper proposes two tasks based on the OpenAI Gym environment: collaborative navigation and collaborative collision avoidance. In these two tasks, ships must learn effective control strategies and collaborate with other ships through communication to achieve safe navigation and avoid collisions. ### Specific problem descriptions - **Partial observability**: Ships can only obtain local observation information, not global information. This increases the complexity of decision - making because ships need to make optimal decisions based on incomplete information. - **Non - stationarity**: In a multi - agent system, the behavior of each ship will affect the environmental state of other ships, resulting in the environment being non - stationary from the perspective of a single ship. This violates the basic assumptions of the Markov decision process (MDP), thereby increasing the difficulty of learning. - **Communication challenges**: Ships need to share information through communication, but the communication channel may have noise and bandwidth limitations. Therefore, how to design a robust communication protocol is an important research problem. ### Solutions The paper proposes using the MADDPG algorithm to solve the above problems. The MADDPG algorithm has the following characteristics: - **Centralized training and decentralized execution (CTDE)**: During the training phase, the central critic can access the local observation and action information of all ships, thereby effectively managing non - stationarity. During the execution phase, each ship independently executes the strategy according to its own local observation, ensuring the scalability and robustness of the system. - **Communication mechanism**: MADDPG allows ships to communicate through explicit message passing, thereby enhancing the coordination ability. This communication mechanism helps to make up for the information loss caused by partial observability. - **Experimental verification**: The paper verifies the effectiveness of the proposed method through two simulation scenarios (collaborative navigation and collaborative collision avoidance) and analyzes the impact of communication noise and bandwidth limitations on performance. In summary, this paper aims to solve the problem of multi - unmanned surface ships' collaborative navigation and collision avoidance in partially observable environments by introducing the MADDPG algorithm and communication mechanism, thereby improving the safety and efficiency of marine operations.