Abstract:In this paper, the intelligent design for the pursuit-evasion game with large scale multi-pursuer and multi-evader has been investigated. Due to the vast number of agents, the notorious ”Curse of Dimensionality” can seriously challenge the traditional design in multi-player pursuit-evasion game, especially under harsh environment with limited communication resource to support information exchange among multi-players. To address this intractable challenge, the emerging Mean Field Games (MFG) theory has been utilized to solve the optimal pursuit-evasion strategies based on a new form of probability density function (PDF) instead of detailed information from all the other players/agents. As such, not only the information exchange is reduced, but also the computation dimension for the optimal strategy derivation is decreased. Specifically, the MFG has been integrated into the pursuit-evasion game to generate a hierarchical structure where the pursuers and the evaders form two mean field groups separately. To online solve the mean field equations, i.e., two coupled partial differential equations, the actor-critic reinforcement learning mechanism is adopted and further extended to a novel actor-critic-mass-opponent (ACMO) approach. In ACMO, the actor neural network estimates the optimal control, the critic neural network approximates the optimal cost function, the mass neural network learns the agent’s group PDF, and the opponent neural network predicts the opponents’ average states in the form of PDF that causes maximum cost for the agent’s group. The Lyapunov theory is utilized to provide the convergence analysis for all neural networks and the stability analysis for the closed-loop system. Eventually, a series of numerical simulations are conducted to demonstrate the effectiveness of the developed scheme.

Actor-Critic Reinforcement Learning Algorithms for Mean Field Games in Continuous Time, State and Action Spaces

Self-play Reinforcement Learning with Comprehensive Critic in Computer Games

Deep Reinforcement Learning for Infinite Horizon Mean Field Problems in Continuous Spaces

Reinforcement Learning for Finite Space Mean-Field Type Games

A Single Online Agent Can Efficiently Learn Mean Field Games

Reinforcement Learning for Mean Field Game

Provable Fictitious Play for General Mean-Field Games

Discrete-Time Mean Field Control with Environment States

Learning in Herding Mean Field Games: Single-Loop Algorithm with Finite-Time Convergence Analysis

Model-free Reinforcement Learning for Non-stationary Mean Field Games

Game-theoretical control with continuous action sets

Actor-Critic Policy Optimization in a Large-Scale Imperfect-Information Game

Model-Free Reinforcement Learning for Mean Field Games

Mean-Field Learning: a Survey

A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

Reinforcement Learning for Continuous-Time Optimal Execution: Actor-Critic Algorithm and Error Analysis

Scalable Offline Reinforcement Learning for Mean Field Games

Policy Consensus-Based Distributed Deterministic Multi-Agent Reinforcement Learning over Directed Graphs

A Robust Mean-Field Actor-Critic Reinforcement Learning Against Adversarial Perturbations on Agent States

Unified Policy Optimization for Continuous-action Reinforcement Learning in Non-stationary Tasks and Games

Decentralized optimal large scale multi-player pursuit-evasion strategies: A mean field game approach with reinforcement learning