Abstract:Wargames are essential simulators for various war scenarios. However, the increasing pace of warfare has rendered traditional wargame decision-making methods inadequate. To address this challenge, wargame-assisted decision-making methods that leverage artificial intelligence techniques, notably reinforcement learning, have emerged as a promising solution. The current wargame environment is beset by a large decision space and sparse rewards, presenting obstacles to optimizing decision-making methods. To overcome these hurdles, a Multi-Agent Deep Deterministic Policy Gradient (MADDPG) based wargame decision-making method is presented. The Partially Observable Markov Decision Process (POMDP), joint action-value function, and the Gumbel-Softmax estimator are applied to optimize MADDPG in order to adapt to the wargame environment. Furthermore, a wargame decision-making method based on the improved MADDPG algorithm is proposed. Using supervised learning in the proposed approach, the training efficiency is improved and the space for manipulation before the reinforcement learning phase is reduced. In addition, a policy gradient estimator is incorporated to reduce the action space and to obtain the global optimal solution. Furthermore, an additional reward function is designed to address the sparse reward problem. The experimental results demonstrate that our proposed wargame decision-making method outperforms the pre-optimization algorithm and other algorithms based on the AC framework in the wargame environment. Our approach offers a promising solution to the challenging problem of decision-making in wargame scenarios, particularly given the increasing speed and complexity of modern warfare.

Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization

A Dynamically Adaptive Approach to Reducing Strategic Interference for Multi-agent Systems

Dueling Network Architecture for Multi-Agent Deep Deterministic Policy Gradient

Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Multi-Agent Reinforcement Learning Via Directed Exploration Method

DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization

A Policy Gradient Algorithm to Alleviate the Multi-Agent Value Overestimation Problem in Complex Environments

A Collaborative Multiagent Reinforcement Learning Method Based on Policy Gradient Potential

Learning Diverse Risk Preferences in Population-based Self-play

Continuously Discovering Novel Strategies Via Reward-Switching Policy Optimization.

An off-policy multi-agent stochastic policy gradient algorithm for cooperative continuous control

Exploring Dominant Strategies in Iterated and Evolutionary Games: a Multi-Agent Reinforcement Learning Approach

Optimizing Crowdsourcing Task Assignment Policies Using Multi-Agent Reinforcement Learning in Stochastic Games

Role Play: Learning Adaptive Role-Specific Strategies in Multi-Agent Interactions

A Game-Theoretic Approach to Multi-agent Trust Region Optimization.

VMAPD: Generate Diverse Solutions for Multi-Agent Games with Recurrent Trajectory Discriminators.

Special Agents Policy Gradient In Value Decomposition-based Approach

Toward Finding Strong Pareto Optimal Policies in Multi-Agent Reinforcement Learning

Research on Wargame Decision-Making Method Based on Multi-Agent Deep Deterministic Policy Gradient

Iteratively Learning Novel Strategies with Diversity Measured in State Distances