Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination

Rui Zhao,Jinming Song,Yufeng Yuan,Haifeng Hu,Yang Gao,Yi Wu,Zhongqian Sun,Wei Yang
DOI: https://doi.org/10.1609/aaai.v37i5.25758
2023-01-01
Proceedings of the AAAI Conference on Artificial Intelligence
Abstract:We study the problem of training a Reinforcement Learning (RL) agent that is collaborative with humans without using human data. Although such agents can be obtained through self-play training, they can suffer significantly from the distributional shift when paired with unencountered partners, such as humans. In this paper, we propose Maximum Entropy Population-based training (MEP) to mitigate such distributional shift. In MEP, agents in the population are trained with our derived Population Entropy bonus to promote the pairwise diversity between agents and the individual diversity of agents themselves. After obtaining this diversified population, a common best agent is trained by paring with agents in this population via prioritized sampling, where the prioritization is dynamically adjusted based on the training progress. We demonstrate the effectiveness of our method MEP, with comparison to Self-Play PPO (SP), Population-Based Training (PBT), Trajectory Diversity (TrajeDi), and Fictitious Co-Play (FCP) in both matrix game and Overcooked game environments, with partners being human proxy models and real humans. A supplementary video showing experimental results is available at https://youtu.be/Xh-FKD0AAKE.
What problem does this paper attempt to address?