Abstract:In the real world, unmanned surface vehicles (USV) often need to coordinate with each other to accomplish specific tasks. However, achieving cooperative control in multi-agent systems is challenging due to issues such as non-stationarity and partial observability. Recent advancements in Multi-Agent Reinforcement Learning (MARL) provide new perspectives to address these challenges. Therefore, we propose using the multi-agent deep deterministic policy gradient (MADDPG) algorithm with communication to address multiple ships' cooperation problems under partial observability. We developed two tasks based on OpenAI's gym environment: cooperative navigation and cooperative collision avoidance. In these tasks, ships must not only learn effective control strategies but also establish communication protocols with other agents. We analyze the impact of external noise on communication, the effect of inter-agent communication on performance, and the communication patterns learned by the agents. The results demonstrate that our proposed framework effectively addresses cooperative navigation and collision avoidance among multiple vessels, significantly outperforming traditional single-agent algorithms. Agents establish a consistent communication protocol, enabling them to compensate for missing information through shared observations and achieve better coordination.

What problem does this paper attempt to address?

This paper attempts to solve the problem of multi - unmanned surface vessels (USV) performing collaborative navigation and collision avoidance in partially observable marine environments. Specifically, the paper mainly focuses on the following aspects: 1. **Non - stationarity and partial observability in multi - agent systems**: - In the actual marine environment, due to the limitations of sensors and the influence of environmental conditions, ships cannot obtain perfect global observation information. In addition, the interaction between multiple ships makes the environment dynamically changing, that is, non - stationary, for a single ship. - To address these problems, the paper introduces multi - agent reinforcement learning (MARL), especially the multi - agent deep deterministic policy gradient (MADDPG) algorithm. 2. **Establishment and optimization of communication protocols**: - The paper studies how to enhance the coordination ability between ships through explicit message passing and explores the impact of communication noise on performance. The design of communication protocols needs to consider not only the effective transmission of information but also issues such as bandwidth limitations and noise interference. 3. **Path planning and collision avoidance strategies**: - The paper proposes two tasks based on the OpenAI Gym environment: collaborative navigation and collaborative collision avoidance. In these two tasks, ships must learn effective control strategies and collaborate with other ships through communication to achieve safe navigation and avoid collisions. ### Specific problem descriptions - **Partial observability**: Ships can only obtain local observation information, not global information. This increases the complexity of decision - making because ships need to make optimal decisions based on incomplete information. - **Non - stationarity**: In a multi - agent system, the behavior of each ship will affect the environmental state of other ships, resulting in the environment being non - stationary from the perspective of a single ship. This violates the basic assumptions of the Markov decision process (MDP), thereby increasing the difficulty of learning. - **Communication challenges**: Ships need to share information through communication, but the communication channel may have noise and bandwidth limitations. Therefore, how to design a robust communication protocol is an important research problem. ### Solutions The paper proposes using the MADDPG algorithm to solve the above problems. The MADDPG algorithm has the following characteristics: - **Centralized training and decentralized execution (CTDE)**: During the training phase, the central critic can access the local observation and action information of all ships, thereby effectively managing non - stationarity. During the execution phase, each ship independently executes the strategy according to its own local observation, ensuring the scalability and robustness of the system. - **Communication mechanism**: MADDPG allows ships to communicate through explicit message passing, thereby enhancing the coordination ability. This communication mechanism helps to make up for the information loss caused by partial observability. - **Experimental verification**: The paper verifies the effectiveness of the proposed method through two simulation scenarios (collaborative navigation and collaborative collision avoidance) and analyzes the impact of communication noise and bandwidth limitations on performance. In summary, this paper aims to solve the problem of multi - unmanned surface ships' collaborative navigation and collision avoidance in partially observable environments by introducing the MADDPG algorithm and communication mechanism, thereby improving the safety and efficiency of marine operations.

Multiple Ships Cooperative Navigation and Collision Avoidance using Multi-agent Reinforcement Learning with Communication

Learning to Cooperate: Application of Deep Reinforcement Learning for Online AGV Path Finding.

Mapless Collaborative Navigation for a Multi-Robot System Based on the Deep Reinforcement Learning

Dynamic Navigation and Area Assignment of Multiple USVs Based on Multi-Agent Deep Reinforcement Learning

Underwater Multi-agent Cooperative Formation Hunting Based on Deep Reinforcement Learning

Multi-USVs Coordinated Detection in Marine Environment with Deep Reinforcement Learning.

HA-MARL: Heuristic and APF Assisted Multi-Agent Reinforcement Learning for Wireless Data Sharing in AUV Swarms

Multi-USV Dynamic Navigation and Target Capture: A Guided Multi-Agent Reinforcement Learning Approach

Hierarchical and Stable Multiagent Reinforcement Learning for Cooperative Navigation Control

Multi-Agent Deep Reinforcement Learning Framework Strategized by Unmanned Aerial Vehicles for Multi-Vessel Full Communication Connection

Control and Coordination of a SWARM of Unmanned Surface Vehicles using Deep Reinforcement Learning in ROS

Safe Multi-Agent Reinforcement Learning for Behavior-Based Cooperative Navigation

Multi-robot Cooperative Navigation Method based on Multi-agent Reinforcement Learning in Sparse Reward Tasks

Reinforcement Learning Based Obstacle Avoidance for AUV Swarm in Dynamic Ocean Environment

Communication-Efficient Decentralized Multi-Agent Reinforcement Learning for Cooperative Adaptive Cruise Control

UAV Cooperative Air Combat Maneuvering Confrontation Based on Multi-agent Reinforcement Learning

Multi-Robot Cooperative Socially-Aware Navigation Using Multi-Agent Reinforcement Learning

Maximizing UAV Coverage in Maritime Wireless Networks: A Multiagent Reinforcement Learning Approach

Secure and Cooperative Target Tracking Via AUV Swarm - A Reinforcement Learning Approach.

A Novel Deep Reinforcement Learning for POMDP-based Autonomous Ship Collision Decision-Making

Distributed Information Fusion Based Trajectory Tracking for USV and UAV Clusters Via Multi-Agent Deep Learning Approach