Fast Peer Adaptation with Context-aware Exploration

Long Ma,Yuanfei Wang,Fangwei Zhong,Song-Chun Zhu,Yizhou Wang
2024-08-09
Abstract:Fast adapting to unknown peers (partners or opponents) with different strategies is a key challenge in multi-agent games. To do so, it is crucial for the agent to probe and identify the peer's strategy efficiently, as this is the prerequisite for carrying out the best response in adaptation. However, exploring the strategies of unknown peers is difficult, especially when the games are partially observable and have a long horizon. In this paper, we propose a peer identification reward, which rewards the learning agent based on how well it can identify the behavior pattern of the peer over the historical context, such as the observation over multiple episodes. This reward motivates the agent to learn a context-aware policy for effective exploration and fast adaptation, i.e., to actively seek and collect informative feedback from peers when uncertain about their policies and to exploit the context to perform the best response when confident. We evaluate our method on diverse testbeds that involve competitive (Kuhn Poker), cooperative (PO-Overcooked), or mixed (Predator-Prey-W) games with peer agents. We demonstrate that our method induces more active exploration behavior, achieving faster adaptation and better outcomes than existing methods.
Artificial Intelligence,Machine Learning,Multiagent Systems
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper primarily addresses the issue of how to quickly adapt to unknown opponents (partners or adversaries) in multi-agent games. Specifically: 1. **Rapid Adaptation Challenge**: In multi-agent games, agents need to quickly identify and adapt to opponents or partners with different strategies. This requires agents to efficiently probe and recognize the strategies of their opponents and respond optimally based on that recognition. 2. **Exploration in Partially Observable Environments**: Particularly in partially observable and long-duration game environments, exploring opponent strategies becomes especially challenging. The paper proposes a "companion recognition reward" mechanism, which evaluates the agent's recognition of opponent behavior patterns based on historical context (such as observations from multiple games). 3. **Balancing Exploration and Exploitation**: In situations with limited information and no explicit reward signals, agents need to learn to balance short-term information acquisition (exploration) with long-term reward maximization (exploitation). Through the above methods, the paper proposes a method called PACE (Peer Adaptation with Context-aware Exploration), which demonstrates faster adaptation and better results compared to existing methods in various test environments (including competitive, cooperative, and mixed games).