Reinforcement Learning on Human Decision Models for Uniquely Collaborative AI Teammates

Nicholas Kantack
DOI: https://doi.org/10.48550/arXiv.2111.09800
2021-11-19
Abstract:In 2021 the Johns Hopkins University Applied Physics Laboratory held an internal challenge to develop artificially intelligent (AI) agents that could excel at the collaborative card game Hanabi. Agents were evaluated on their ability to play with human players whom the agents had never previously encountered. This study details the development of the agent that won the challenge by achieving a human-play average score of 16.5, outperforming the current state-of-the-art for human-bot Hanabi scores. The winning agent's development consisted of observing and accurately modeling the author's decision making in Hanabi, then training with a behavioral clone of the author. Notably, the agent discovered a human-complementary play style by first mimicking human decision making, then exploring variations to the human-like strategy that led to higher simulated human-bot scores. This work examines in detail the design and implementation of this human compatible Hanabi teammate, as well as the existence and implications of human-complementary strategies and how they may be explored for more successful applications of AI in human machine teams.
Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to develop an artificial intelligence (AI) agent that can collaborate efficiently with human players in the cooperative card game "Hanabi". Specifically, the paper explores training AI agents through reinforcement learning, enabling them to understand and imitate human decision - making models, and then discover game strategies that are complementary to humans, in order to achieve higher cooperation results. Although the traditional self - play method performs well when the AI is on its own, it often has poor results when cooperating with human players, because these AIs mainly learn strategies in competition with their own replicas, rather than in cooperation with humans. Therefore, the focus of the paper is on exploring how to train AI agents that can effectively collaborate with human players in the "Hanabi" game by observing and modeling the decision - making processes of human players.