Fast Peer Adaptation with Context-aware Exploration

Long Ma,Yuanfei Wang,Fangwei Zhong,Song-Chun Zhu,Yizhou Wang

2024-08-09

Abstract:Fast adapting to unknown peers (partners or opponents) with different strategies is a key challenge in multi-agent games. To do so, it is crucial for the agent to probe and identify the peer's strategy efficiently, as this is the prerequisite for carrying out the best response in adaptation. However, exploring the strategies of unknown peers is difficult, especially when the games are partially observable and have a long horizon. In this paper, we propose a peer identification reward, which rewards the learning agent based on how well it can identify the behavior pattern of the peer over the historical context, such as the observation over multiple episodes. This reward motivates the agent to learn a context-aware policy for effective exploration and fast adaptation, i.e., to actively seek and collect informative feedback from peers when uncertain about their policies and to exploit the context to perform the best response when confident. We evaluate our method on diverse testbeds that involve competitive (Kuhn Poker), cooperative (PO-Overcooked), or mixed (Predator-Prey-W) games with peer agents. We demonstrate that our method induces more active exploration behavior, achieving faster adaptation and better outcomes than existing methods.

Artificial Intelligence,Machine Learning,Multiagent Systems

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve The paper primarily addresses the issue of how to quickly adapt to unknown opponents (partners or adversaries) in multi-agent games. Specifically: 1. **Rapid Adaptation Challenge**: In multi-agent games, agents need to quickly identify and adapt to opponents or partners with different strategies. This requires agents to efficiently probe and recognize the strategies of their opponents and respond optimally based on that recognition. 2. **Exploration in Partially Observable Environments**: Particularly in partially observable and long-duration game environments, exploring opponent strategies becomes especially challenging. The paper proposes a "companion recognition reward" mechanism, which evaluates the agent's recognition of opponent behavior patterns based on historical context (such as observations from multiple games). 3. **Balancing Exploration and Exploitation**: In situations with limited information and no explicit reward signals, agents need to learn to balance short-term information acquisition (exploration) with long-term reward maximization (exploitation). Through the above methods, the paper proposes a method called PACE (Peer Adaptation with Context-aware Exploration), which demonstrates faster adaptation and better results compared to existing methods in various test environments (including competitive, cooperative, and mixed games).

Fast Peer Adaptation with Context-aware Exploration

Adaptive algorithm for multi-agent learning optimal cooperative pursuit strategy based on Markov game

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning

Fast Teammate Adaptation in the Presence of Sudden Policy Change

Self-Motivated Multi-Agent Exploration

Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization

Conservative Offline Policy Adaptation in Multi-Agent Games.

Adaptive Agent Architecture for Real-time Human-Agent Teaming

Subspace-Aware Exploration for Sparse-Reward Multi-Agent Tasks.

Efficiently tracking multi-strategic opponents: A context-aware Bayesian policy reuse approach

To be a fast adaptive learner: using game history to defeat opponents

Role Play: Learning Adaptive Role-Specific Strategies in Multi-Agent Interactions

Peer Learning: Learning Complex Policies in Groups from Scratch via Action Recommendations

Never Give Up: Learning Directed Exploration Strategies

Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games

Coach-Player Multi-Agent Reinforcement Learning for Dynamic Team Composition

Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

In-Context Exploiter for Extensive-Form Games

Hierarchical Deep Reinforcement Learning Agent with Counter Self-play on Competitive Games

Learning from Learners: Adapting Reinforcement Learning Agents to be Competitive in a Card Game