Abstract:In Multi-agent Reinforcement Learning (MARL), accurately perceiving opponents' strategies is essential for both cooperative and adversarial contexts, particularly within dynamic environments. While Proximal Policy Optimization (PPO) and related algorithms such as Actor-Critic with Experience Replay (ACER), Trust Region Policy Optimization (TRPO), and Deep Deterministic Policy Gradient (DDPG) perform well in single-agent, stationary environments, they suffer from high variance in MARL due to non-stationary and hidden policies of opponents, leading to diminished reward performance. Additionally, existing methods in MARL face significant challenges, including the need for inter-agent communication, reliance on explicit reward information, high computational demands, and sampling inefficiencies. These issues render them less effective in continuous environments where opponents may abruptly change their policies without prior notice. Against this background, we present OPS-DeMo (Online Policy Switch-Detection Model), an online algorithm that employs dynamic error decay to detect changes in opponents' policies. OPS-DeMo continuously updates its beliefs using an Assumed Opponent Policy (AOP) Bank and selects corresponding responses from a pre-trained Response Policy Bank. Each response policy is trained against consistently strategizing opponents, reducing training uncertainty and enabling the effective use of algorithms like PPO in multi-agent environments. Comparative assessments show that our approach outperforms PPO-trained models in dynamic scenarios like the Predator-Prey setting, providing greater robustness to sudden policy shifts and enabling more informed decision-making through precise opponent policy insights.

Agent Probing Interaction Policies

Interactive Agent Modeling by Learning to Probe

Environment Probing Interaction Policies

Active Probing and Influencing Human Behaviors Via Autonomous Agents

Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy

Non-cooperative Multi-agent Systems with Exploring Agents

ProAgent: Building Proactive Cooperative Agents with Large Language Models

Learning Intuitive Policies Using Action Features

Learning Effective Communication for Cooperative Pursuit with Multi-Agent Reinforcement Learning

Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization

Multi-Agent Sparse Interaction Modeling is an Anomaly Detection Problem

Learning Goal-oriented Dialogue Policy with Opposite Agent Awareness.

Stabilizing Multi-Agent Deep Reinforcement Learning by Implicitly Estimating Other Agents’ Behaviors

Explaining the Behaviour of Reinforcement Learning Agents in a Multi-Agent Cooperative Environment Using Policy Graphs

Selective Policy Transfer in Multi-Agent Systems with Sparse Interactions

Prosocial learning agents solve generalized Stag Hunts better than selfish ones

Multi-agent cooperation through learning-aware policy gradients

Learning Latent Representations to Influence Multi-Agent Interaction

MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization

Learning to Switch Among Agents in a Team via 2-Layer Markov Decision Processes

Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time Strategy Switch Identification Using Running Error Estimation