Accurate policy detection and efficient knowledge reuse against multi-strategic opponents

Hao Chen,Quan Liu,Ke Fu,Jian Huang,Chang Wang,Jianxing Gong
DOI: https://doi.org/10.1016/j.knosys.2022.108404
2022-04-01
Abstract:In Markov games, how to respond quickly and optimally for an agent against opponents that follow changing policies is an open problem. Most state-of-the-art algorithms assume that players only change their policies at the end of an episode, and the agent can obtain the same optimal episodic rewards by accurately detecting the opponent policy. However, the opponent may change its policies within an episode, or switch to an unknown policy. Besides, the agent is more likely to achieve inconsistent optimal returns because of different opponent policies, which brings greater challenges to policy detection. In an effort to overcome these challenges, this paper proposes an algorithm to achieve accurate opponent policy detection and efficient knowledge reuse. Within an episode, an inter-episode belief and an intra-episode belief are jointly used to continuously infer the opponent’s identity taking into account the episodic rewards and opponent models. Then, the agent can reuse the best response policy directly. We also detect whether the opponent adopts an unknown policy based on performance models after each episode. For the detected unknown opponent type, we model the previously learned policies as corresponding options for indirect knowledge reuse. Moreover, an option-based knowledge reuse (OKR) network is introduced to guide new response policy learning by adaptively reusing useful knowledge from the existing learned policies. We demonstrate the advantages of the proposed algorithm over several state-of-the-art algorithms in three competitive scenarios.
computer science, artificial intelligence
What problem does this paper attempt to address?