From mimic to counteract: a two-stage reinforcement learning algorithm for Google research football

Junjie Zhao,Jiangwen Lin,Xinyan Zhang,Yuanbai Li,Xianzhong Zhou,Yuxiang Sun
DOI: https://doi.org/10.1007/s00521-024-09455-x
2024-02-22
Neural Computing and Applications
Abstract:Deep reinforcement learning has proven to be effective in various video games, such as Atari games, StarCraft II, Google research football (GRF), and Dota II. We participated in the 2022 IEEE Conference on Games Football AI Competition and ranked in the top eight. Despite recent efforts, building agents for GRF still suffers from multi-agent coordination, sparse rewards, and stochastic environments. To address these issues and achieve good outcomes in the competition, we devised a reinforcement learning algorithm that uses deep reinforcement learning from demonstrations and policy distillation. In this study, we innovatively propose a two-stage algorithm named mimic-to-counteract reinforcement learning (MCRL) based on the historical game logs of opponents, we encountered during the warm-up session and formulated partner agents function similarly to human sparring partners, whereby they simulate opponents with diverse styles of play, enabling primary players to practice against a range of policies, they may encounter in real competitions. Additionally, we trained numerous mentor agents capable of restraining the sparring partners. We distilled their policies and amalgamated them to train a potent primary agent. Empirical results show that the proposed MCRL algorithm can efficiently search for valuable strategies with stable updates and balance the relationship between policy iteration and policy style deviation. Also, the primary agent can learn diverse but coordinated counteracting strategies and ranks in the top eight in the competition.
computer science, artificial intelligence
What problem does this paper attempt to address?