Abstract:There is a concerted effort to build intelligent sea and numerous artificial intelligence technologies have been explored. At present, more and more people are engaged in the research of deep reinforcement learning algorithm, and its mainstream application is in the field of games. Reinforcement learning has conquered chess belonging to complete information game, and Texas poker belonging to incomplete information games. And it reached or even surpassed the highest player level of mankind in E-sports games with huge state space and complex action space. However, reinforcement learning algorithm still has great challenges in fields such as automatic driving. The main reason is that the training of reinforcement learning needs to build an environment for interacting with agents. However, it is very difficult to construct realistic simulation scenes, and there is no guarantee that we will not encounter the state that the agent has not seen. Therefore, it is necessary to explore the simulation scene first. Based on this, this paper mainly studies reinforcement learning in simulation scenario. There are huge challenges in migrating them to real scenario applications, especially in sea missions. Aiming at the heterogeneous multi-agent game confrontation scenario, this paper proposes a sea battlefield game confrontation decision algorithm based on multi-agent deep deterministic policy gradient. The algorithm combines long short-term memory and actor-critic, which not only realizes the convergence of the algorithm in huge state space and action space, but also solves the problem of sparse real rewards. At the same time, imitation learning is integrated into the decision algorithm, which not only improves the convergence speed of the algorithm, but also greatly improves the effectiveness of the algorithm. The results show that the algorithm can deal with a variety of different tactical sea battlefield scenarios, make flexible decisions according to the changes of the enemy, and the average winning rate is close to 90%.

ME‐MADDPG: An efficient learning‐based motion planning method for multiple agents in complex environments

Learning to Cooperate: Application of Deep Reinforcement Learning for Online AGV Path Finding.

Research on Multi-Agent Task Allocation and Path Planning Based on Pri-MADDPG

Multi-agent collaborative path planning algorithm with reinforcement learning and combined prioritized experience replay in Internet of Things

Improving Sample Efficiency in Multi-Agent Actor-Critic Methods

The Design and Realization of Multi-agent Obstacle Avoidance based on Reinforcement Learning

Expert System-Based Multiagent Deep Deterministic Policy Gradient for Swarm Robot Decision Making

Multi-Agent Path Planning based on MPC and DDPG

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

Time-aware MADDPG with LSTM for multi-agent obstacle avoidance: a comparative study

Multi-agent policy learning-based path planning for autonomous mobile robots

Path Planning in Complex Environments Using Attention-Based Deep Deterministic Policy Gradient

A Policy Gradient Algorithm to Alleviate the Multi-Agent Value Overestimation Problem in Complex Environments

A MADDPG-based multi-agent antagonistic algorithm for sea battlefield confrontation

Optimal operation of regional integrated energy system based on multi-agent deep deterministic policy gradient algorithm

Efficient Path Planning for Mobile Robot Based on Deep Deterministic Policy Gradient

Cooperative Control of Multiple AGVs Based on Multi-Agent Reinforcement Learning

A Decentralized Multi-Agent Path Planning Approach Based on Imitation Learning and Selective Communication

Multi-UAV pursuit-evasion gaming based on PSO-M3DDPG schemes

From Nash Q-learning to nash-MADDPG: Advancements in multiagent control for multiproduct flexible manufacturing systems

Multiple unmanned aerial vehicle coordinated strikes against ground targets based on an improved multi-agent deep deterministic policy gradient algorithm