Abstract:In Multi-agent Reinforcement Learning (MARL), accurately perceiving opponents' strategies is essential for both cooperative and adversarial contexts, particularly within dynamic environments. While Proximal Policy Optimization (PPO) and related algorithms such as Actor-Critic with Experience Replay (ACER), Trust Region Policy Optimization (TRPO), and Deep Deterministic Policy Gradient (DDPG) perform well in single-agent, stationary environments, they suffer from high variance in MARL due to non-stationary and hidden policies of opponents, leading to diminished reward performance. Additionally, existing methods in MARL face significant challenges, including the need for inter-agent communication, reliance on explicit reward information, high computational demands, and sampling inefficiencies. These issues render them less effective in continuous environments where opponents may abruptly change their policies without prior notice. Against this background, we present OPS-DeMo (Online Policy Switch-Detection Model), an online algorithm that employs dynamic error decay to detect changes in opponents' policies. OPS-DeMo continuously updates its beliefs using an Assumed Opponent Policy (AOP) Bank and selects corresponding responses from a pre-trained Response Policy Bank. Each response policy is trained against consistently strategizing opponents, reducing training uncertainty and enabling the effective use of algorithms like PPO in multi-agent environments. Comparative assessments show that our approach outperforms PPO-trained models in dynamic scenarios like the Predator-Prey setting, providing greater robustness to sudden policy shifts and enabling more informed decision-making through precise opponent policy insights.

Efficiently tracking multi-strategic opponents: A context-aware Bayesian policy reuse approach

Accurate policy detection and efficient knowledge reuse against multi-strategic opponents

Efficient Policy Detecting and Reusing for Non-Stationarity in Markov Games.

A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents.

Large Scale Pursuit-Evasion under Collision Avoidance Using Deep Reinforcement Learning.

A Plan Recognition Approach for Agent in Adversarial Multi-Agent System

Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time Strategy Switch Identification Using Running Error Estimation

Efficient use of heuristics for accelerating XCS-based policy learning in Markov games

Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning

Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning

Robust optimal policies for team Markov games

Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game

Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

Fast Peer Adaptation with Context-aware Exploration

Selective Policy Transfer in Multi-Agent Systems with Sparse Interactions

Opponent portrait for multiagent reinforcement learning in competitive environment

Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts

Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization

Metric Policy Representations for Opponent Modeling

Multi-agent Reinforcement Learning with Approximate Model Learning for Competitive Games.

Multi-Agent Combat in Non-Stationary Environments