Learning in Games with Progressive Hiding

Benjamin Heymann,Marc Lanctot
2024-10-09
Abstract:When learning to play an imperfect information game, it is often easier to first start with the basic mechanics of the game rules. For example, one can play several example rounds with private cards revealed to all players to better understand the basic actions and their effects. Building on this intuition, this paper introduces {\it progressive hiding}, an algorithm that learns to play imperfect information games by first learning the basic mechanics and then progressively adding information constraints over time. Progressive hiding is inspired by methods from stochastic multistage optimization such as scenario decomposition and progressive hedging. We prove that it enables the adaptation of counterfactual regret minimization to games where perfect recall is not satisfied. Numerical experiments illustrate that progressive hiding can achieve optimal payoff in a benchmark of emergent communication trading game.
Computer Science and Game Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the learning process by gradually hiding information in games of incomplete information. Specifically, the author proposes an algorithm called "progressive hiding", aiming to effectively train agents to make decisions in games of incomplete information by first learning the basic mechanisms of the game and then gradually increasing information constraints. ### Main Problem Description 1. **Learning Challenges in Games of Incomplete Information**: - In games of incomplete information, players cannot fully observe the state of the game. For example, in a poker game, a player can only see their own hand and does not know the hands of other players. - This information asymmetry makes traditional learning methods (such as counterfactual regret minimization, CFR) difficult to be directly applied because these methods usually assume that players have perfect recall ability, that is, they can remember all the information they have seen and done. 2. **Limitations of Existing Methods**: - The traditional CFR algorithm relies on the perfect recall assumption, but in some games (such as Hanabi), this assumption does not hold, so new methods are needed to handle such problems. 3. **Motivation for the Progressive Hiding Algorithm**: - Inspired by gradually introducing complexity in the teaching process, the author proposes the progressive hiding algorithm. This algorithm first relaxes information constraints, allowing players to learn the basic rules in a relatively simple and information - transparent environment, and then gradually increases the difficulty of information hiding. - This method is similar to the way of teaching children to play games in real life: let them play a few games with all information disclosed first, so as to better understand the basic actions and their effects, and then gradually introduce information hiding. ### Solution Overview - **Progressive Hiding Algorithm**: - This algorithm combines the ideas of no - regret learning and information relaxation. - It gradually increases information constraints by introducing an auxiliary game, ensuring that players can obtain sufficient feedback at each stage, thereby continuously optimizing their strategies. - In terms of specific implementation, the algorithm uses a projection step and a proximal step, and controls the degree of information hiding by adjusting the penalty function. - **Theoretical Contributions**: - The author proves that the progressive hiding algorithm can recover the main properties of CFR under certain conditions and can achieve optimal returns in the auxiliary game. - Experimental results show that the progressive hiding algorithm performs well in multiple benchmark tests, especially in the Trade Comm game, where it can reach the optimal return more quickly. ### Conclusion By introducing the progressive hiding algorithm, this paper provides an effective method to deal with the learning challenges in games of incomplete information. This method is not only strictly proven theoretically but also shows good performance in actual experiments. This provides new ideas and tools for future research in more complex game environments.