Abstract:In recent years we have seen fast progress on a number of benchmark problems in AI, with modern methods achieving near or super human performance in Go, Poker and Dota. One common aspect of all of these challenges is that they are by design adversarial or, technically speaking, zero-sum. In contrast to these settings, success in the real world commonly requires humans to collaborate and communicate with others, in settings that are, at least partially, cooperative. In the last year, the card game Hanabi has been established as a new benchmark environment for AI to fill this gap. In particular, Hanabi is interesting to humans since it is entirely focused on theory of mind, i.e., the ability to effectively reason over the intentions, beliefs and point of view of other agents when observing their actions. Learning to be informative when observed by others is an interesting challenge for Reinforcement Learning (RL): Fundamentally, RL requires agents to explore in order to discover good policies. However, when done naively, this randomness will inherently make their actions less informative to others during training. We present a new deep multi-agent RL method, the Simplified Action Decoder (SAD), which resolves this contradiction exploiting the centralized training phase. During training SAD allows other agents to not only observe the (exploratory) action chosen, but agents instead also observe the greedy action of their team mates. By combining this simple intuition with best practices for multi-agent learning, SAD establishes a new SOTA for learning methods for 2-5 players on the self-play part of the Hanabi challenge. Our ablations show the contributions of SAD compared with the best practice components. All of our code and trained agents are available at <a class="link-external link-https" href="https://github.com/facebookresearch/Hanabi_SAD" rel="external noopener nofollow">this https URL</a>.

RARSMSDou: Master the Game of DouDiZhu With Deep Reinforcement Learning Algorithms

DouZero+: Improving DouDizhu AI by Opponent Modeling and Coach-guided Learning

Full DouZero+: Improving DouDizhu AI by Opponent Modeling, Coach-Guided Training and Bidding Learning

A Deep Reinforcement Learning-Based Approach in Porker Game

AlphaDou: High-Performance End-to-End Doudizhu AI Integrating Bidding

Self-play Reinforcement Learning with Comprehensive Critic in Computer Games

DouRN: Improving DouZero by Residual Neural Networks

PerfectDou: Dominating DouDizhu with Perfect Information Distillation

DanZero+: Dominating the GuanDan Game through Reinforcement Learning

Combining Tree Search and Action Prediction for State-of-the-Art Performance in DouDiZhu

DanZero: Mastering GuanDan Game with Reinforcement Learning

Suphx: Mastering Mahjong with Deep Reinforcement Learning

Towards Playing Full MOBA Games with Deep Reinforcement Learning

Towards a Competitive 3-Player Mahjong AI Using Deep Reinforcement Learning

AlphaDDA: strategies for adjusting the playing strength of a fully trained AlphaZero system to a suitable human training partner

A Human Mixed Strategy Approach to Deep Reinforcement Learning

Mastering Complex Control in MOBA Games with Deep Reinforcement Learning

A Comparative Study of Deep Reinforcement Learning Models: DQN vs PPO vs A2C

Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning

Neural Auto-Curricula in Two-Player Zero-Sum Games.

Research on Action Strategies and Simulations of DRL and MCTS-based Intelligent Round Game