Abstract:In multi-agent domains, dealing with non-stationary opponents that change behaviors (policies) consistently over time is still a challenging problem, where an agent usually requires the ability to detect the opponent's policy accurately and adopt the optimal response policy accordingly. Previous works commonly assume that the opponent's observations and actions during online interactions are known, which can significantly limit their applications, especially in partially observable environments. This paper focuses on efficient policy detecting and reusing techniques against non-stationary opponents without their local information. We propose an algorithm called Bayesian policy reuse with LocAl oBservations (Bayes-Lab) by incorporating variational autoencoders (VAE) into the Bayesian policy reuse (BPR) framework. Following the centralized training with decentralized execution (CTDE) paradigm, we train VAE as an opponent model during the offline phase to extract the latent relationship between the agent's local observations and the opponent's local observations. During online execution, the trained opponent models are used to reconstruct the opponent's local observations, which can be combined with episodic rewards to update the belief about the opponent's policy. Finally, the agent reuses the best response policy based on the updated belief to improve online performance. We demonstrate that Bayes-Lab outperforms existing state-of-the-art methods in terms of detection accuracy, accumulative rewards, and episodic rewards in a predator-prey scenario. In this experimental environment, Bayes-Lab can achieve about 80% detection accuracy and the highest accumulative rewards, and its performance is less affected by the opponent policy switching interval. When the switching interval is less than 10, its detection accuracy is at least 10% higher than other algorithms.

Bayes-ToMoP: A Fast Detection and Best Response Algorithm Towards Sophisticated Opponents.

Towards Efficient Detection and Optimal Response against Sophisticated Opponents.

Towards Efficient Detection and Optimal Response Against Sophisticated Opponents

Think That Attackers Think: Using First-Order Theory of Mind in Intrusion Response System.

Detecting and Tracing Multi-Strategic Agents with Opponent Modelling and Bayesian Policy Reuse

Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time Strategy Switch Identification Using Running Error Estimation

OM-TCN: A Dynamic and Agile Opponent Modeling Approach for Competitive Games

An Improved Approach Towards Multi-Agent Pursuit–Evasion Game Decision-Making Using Deep Reinforcement Learning

Model-Based Opponent Modeling

Accurate policy detection and efficient knowledge reuse against multi-strategic opponents

A Dynamically Adaptive Approach to Reducing Strategic Interference for Multi-agent Systems

Opponent portrait for multiagent reinforcement learning in competitive environment

Efficiently tracking multi-strategic opponents: A context-aware Bayesian policy reuse approach

Belief-state Monte-Carlo Tree Search for Phantom Games

Adversarial Decision-Making for Moving Target Defense: A Multi-Agent Markov Game and Reinforcement Learning Approach

Best Response Shaping

Efficiently Detecting Non-Stationary Opponents: A Bayesian Policy Reuse Approach under Partial Observability

Efficient Policy Detecting and Reusing for Non-Stationarity in Markov Games.

A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents.

Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning

A muti-agent defensive strategy based on monte carlo method