Abstract:Few multi-agent reinforcement learning (MARL) research on Google Research Football (GRF) focus on the 11v11 multi-agent full-game scenario and to the best of our knowledge, no open benchmark on this scenario has been released to the public. In this work, we fill the gap by providing a population-based MARL training pipeline and hyperparameter settings on multi-agent football scenario that outperforms the bot with difficulty 1.0 from scratch within 2 million steps. Our experiments serve as a reference for the expected performance of Independent Proximal Policy Optimization (IPPO), a state-of-the-art multi-agent reinforcement learning algorithm where each agent tries to maximize its own policy independently across various training configurations. Meanwhile, we open-source our training framework Light-MALib which extends the MALib codebase by distributed and asynchronized implementation with additional analytical tools for football games. Finally, we provide guidance for building strong football AI with population-based training and release diverse pretrained policies for benchmarking. The goal is to provide the community with a head start for whoever experiment their works on GRF and a simple-to-use population-based training framework for further improving their agents through self-play. The implementation is available at <a class="link-external link-https" href="https://github.com/Shanghai-Digital-Brain-Laboratory/DB-Football" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the lack of research and benchmarking in multi - agent reinforcement learning (MARL) in the 11 - v - 11 multi - agent full - field game scenario of Google Research Football (GRF). Specifically: 1. **Lack of public benchmarks**: As far as the authors know, there were no previously public benchmarks for the 11 - v - 11 multi - agent full - field game scenario. 2. **High training difficulty**: Due to sparse rewards, long game times, high randomness in state transitions, and issues such as role or credit assignment, training multi - agent systems in this complex scenario is extremely challenging. To solve these problems, the authors proposed the following methods: - **Provided a population - based MARL training pipeline**: Through this method, the authors were able to train a model from scratch within 2 million steps that outperforms the built - in AI (difficulty 1.0). - **Open - sourced the training framework Light - MALib**: This framework extends the MALib codebase, enables distributed and asynchronous training, and provides additional analysis tools, especially suitable for football games. - **Released diverse pre - trained strategies**: These strategies can serve as good initializations or baselines for future research. In addition, the authors also conducted extensive experiments, compared different training configurations, and provided technical suggestions on how to further improve the AI through self - play. Overall, this research aims to provide a good starting point for the community to conduct experiments on GRF and an easy - to - use population - based training framework to further enhance the capabilities of agents. ### Formula Summary The formulas involved in this paper are mainly used to describe the Independent Proximal Policy Optimization (IPPO) algorithm and its loss function: 1. **Objective function**: \[ \theta \leftarrow \arg\max_\theta \mathcal{J}(\theta) = \mathbb{E}_{a_t, s_t}\left[\sum_{t} \gamma^t R(s_t, a_t)\right] \] 2. **IPPO policy loss**: \[ \mathcal{L}(\theta) = \sum \mathbb{E}_{s \sim \rho_{\theta_{\text{old}}}, a \sim \pi_{\theta_{\text{old}}}} \left[ \min \left( \frac{\pi_\theta(a_i|s)}{\pi_{\theta_{\text{old}}}(a_i|s)} \hat{A}_t^{n = 1}, \text{clip}\left( \frac{\pi_\theta(a_i|s)}{\pi_{\theta_{\text{old}}}(a_i|s)}, 1-\epsilon, 1+\epsilon \right) \hat{A}_t \right) \right] \] 3. **Advantage estimation**: \[ \hat{A}_t = \sum_{l = 0}^{h} (\gamma \lambda)^l \delta_{t + l} \] where, \[ \delta_t = r_t(s_t, a_t) + \gamma V_\phi(s_{t + 1}) - V_\phi(s_t) \] 4. **Value loss function**: \[ \mathcal{L}_i(\phi) = \mathbb{E}_{s \sim \rho_{\theta_{\text{old}}}} \left[ \min \left( (V_\phi(s_t) - \hat{V}_t)^2, (V_{\phi_{\text{old}}}(s_t) + \text{clip}(V_\phi(s_t) - V_{\phi_{

An Empirical Study on Google Research Football Multi-agent Scenarios

Boosting Studies of Multi-Agent Reinforcement Learning on Google Research Football Environment: the Past, Present, and Future

Learning to Cooperate: Application of Deep Reinforcement Learning for Online AGV Path Finding.

Google Research Football: A Novel Reinforcement Learning Environment

From mimic to counteract: a two-stage reinforcement learning algorithm for Google research football

TiKick: Toward Playing Multi-agent Football Full Games from Single-agent Demonstrations

TiKick: Towards Playing Multi-agent Football Full Games from Single-agent Demonstrations

Embedding Contextual Information through Reward Shaping in Multi-Agent Learning: A Case Study from Google Football

Containerized Distributed Value-Based Multi-Agent Reinforcement Learning

MARLadona -- Towards Cooperative Team Play Using Multi-Agent Reinforcement Learning

Graph Neural Network based Agent in Google Research Football

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning

Diversity is Strength: Mastering Football Full Game with Interactive Reinforcement Learning of Multiple AIs

Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration

A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning

FoX: Formation-aware exploration in multi-agent reinforcement learning

MARL-LNS: Cooperative Multi-agent Reinforcement Learning via Large Neighborhoods Search

Off-Agent Trust Region Policy Optimization

A Collaborative Multiagent Reinforcement Learning Method Based on Policy Gradient Potential

An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective