Abstract:In recent years we have seen fast progress on a number of benchmark problems in AI, with modern methods achieving near or super human performance in Go, Poker and Dota. One common aspect of all of these challenges is that they are by design adversarial or, technically speaking, zero-sum. In contrast to these settings, success in the real world commonly requires humans to collaborate and communicate with others, in settings that are, at least partially, cooperative. In the last year, the card game Hanabi has been established as a new benchmark environment for AI to fill this gap. In particular, Hanabi is interesting to humans since it is entirely focused on theory of mind, i.e., the ability to effectively reason over the intentions, beliefs and point of view of other agents when observing their actions. Learning to be informative when observed by others is an interesting challenge for Reinforcement Learning (RL): Fundamentally, RL requires agents to explore in order to discover good policies. However, when done naively, this randomness will inherently make their actions less informative to others during training. We present a new deep multi-agent RL method, the Simplified Action Decoder (SAD), which resolves this contradiction exploiting the centralized training phase. During training SAD allows other agents to not only observe the (exploratory) action chosen, but agents instead also observe the greedy action of their team mates. By combining this simple intuition with best practices for multi-agent learning, SAD establishes a new SOTA for learning methods for 2-5 players on the self-play part of the Hanabi challenge. Our ablations show the contributions of SAD compared with the best practice components. All of our code and trained agents are available at <a class="link-external link-https" href="https://github.com/facebookresearch/Hanabi_SAD" rel="external noopener nofollow">this https URL</a>.

Reinforcement Learning on Human Decision Models for Uniquely Collaborative AI Teammates

Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi

Human-AI Coordination via Human-Regularized Search and Learning

More than Task Performance: Developing New Criteria for Successful Human-AI Teaming Using the Cooperative Card Game Hanabi

Collaborating with Humans without Human Data

Natural Language-Based Human–Machine Collaborative Learning Games Algorithm Based on Deep Rein-Forcement Learning

Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination

Real-World Human-Robot Collaborative Reinforcement Learning

Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning

Human-AI Learning Performance in Multi-Armed Bandits

Enhancing Human Experience in Human-Agent Collaboration: A Human-Centered Modeling Approach Based on Positive Human Gain

Decision-Oriented Dialogue for Human-AI Collaboration

Human-AI Collaboration in a Cooperative Game Setting

Theory of Mind for Deep Reinforcement Learning in Hanabi

Ancillary Mechanism for Autonomous Decision-Making Process in Asymmetric Confrontation: a View from Gomoku

"Other-Play" for Zero-Shot Coordination

Modified Action Decoder Using Bayesian Reasoning for Multi-Agent Deep Reinforcement Learning

Intelligent Decision-Making and Human Language Communication Based on Deep Reinforcement Learning in a Wargame Environment

Adaptive Agent Architecture for Real-time Human-Agent Teaming

Human-AI Teamwork Interface Design Using Patterns of Interactions

Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making