Abstract:In zero-sum games, the optimal strategy is well-defined by the Nash equilibrium. However, it is overly conservative when playing against suboptimal opponents and it can not exploit their weaknesses. Limited look-ahead game solving in imperfect-information games allows defeating human experts in massive real-world games such as Poker, Liar's Dice, and Scotland Yard. However, since they approximate Nash equilibrium, they tend to only win slightly against weak opponents. We propose methods combining limited look-ahead solving with an opponent model in order to 1) approximate a best response in large games or 2) compute a robust response with control over the robustness of the response. Both methods can compute the response in real time to previously unseen strategies. We present theoretical guarantees of our methods. We show that existing robust response methods do not work combined with limited look-ahead solving of the shelf, and we propose a novel solution for the issue. Our algorithm performs significantly better than multiple baselines in smaller games and outperforms state-of-the-art methods against SlumBot.

What problem does this paper attempt to address?

The paper primarily addresses the problem of designing algorithms to target suboptimal opponents in imperfect information games and proposes a theoretically guaranteed method. Specifically, the paper focuses on how to formulate strategies to exploit the weaknesses of suboptimal opponents in zero-sum games. Traditional Nash equilibrium strategies are too conservative and cannot effectively exploit the opponent's deficiencies. To solve this problem, the paper proposes several methods: 1. **Continual Depth-Bounded Best Response (CDBR)**: This is a method that combines depth-bounded search with opponent modeling to approximate the best response in large games. This method allows players to compute responses to previously unseen strategies in real-time. 2. **Robust Response**: When there is uncertainty in the opponent model, robust response can control the robustness of the response. The paper discusses two methods of robust response: - **CDBR-NE**: This is a linear combination of CDBR and Nash equilibrium strategies, which can be used to limit the exploitability of the strategy. - **Continual Depth-Bounded Restricted Nash Response (CDRNR)**: This method better balances the trade-off between payoff and exploitability and allows for safe control of parameters. 3. **Addressing Issues in Depth-Bounded Search**: To ensure the theoretical correctness of the above methods, the paper also explores issues that arise when using opponent models in depth-bounded search and proposes a solution called "full gadget." This solution maintains the path from previously solved subgames to the root node and uses a value function for evaluation when leaving these solved parts. Through the above methods, the paper not only provides effective strategies for targeting suboptimal opponents but also demonstrates that the proposed algorithms are theoretically sound and experimentally validated. The algorithms show significant advantages, especially in small games like poker and against specific AI systems such as SlumBot.

Continual Depth-limited Responses for Computing Counter-strategies in Sequential Games

Monte Carlo Continual Resolving for Online Strategy Computation in Imperfect Information Games

Monte Carlo Neural Fictitious Self-Play: Achieve Approximate Nash equilibrium of Imperfect-Information Games.

Approximate exploitability: Learning a best response in large games

Solving Large Extensive-Form Games with Strategy Constraints

Finding and Certifying (Near-)Optimal Strategies in Black-Box Extensive-Form Games

Value functions for depth-limited solving in zero-sum imperfect-information games

Simultaneous incremental support adjustment and metagame solving: An equilibrium-finding framework for continuous-action games

Faster Algorithms for Optimal Ex-Ante Coordinated Collusive Strategies in Extensive-Form Zero-Sum Games

A Unified Perspective on Deep Equilibrium Finding

Algorithm for Computing Approximate Nash Equilibrium in Continuous Games with Application to Continuous Blotto

Learning Probably Approximately Correct Maximin Strategies in Simulation-Based Games with Infinite Strategy Spaces

Robust Stackelberg Equilibria in Extensive-Form Games and Extension to Limited Lookahead

Student of Games: A unified learning algorithm for both perfect and imperfect information games

HSVI-based Online Minimax Strategies for Partially Observable Stochastic Games with Neural Perception Mechanisms

Regret Minimization in Non-Zero-Sum Games with Applications to Building Champion Multiplayer Computer Poker Agents

Finding Optimal Abstract Strategies in Extensive-Form Games

Learning to Play Against Unknown Opponents

Solving Poker Games Efficiently: Adaptive Memory Based Deep Counterfactual Regret Minimization

VISER: A Tractable Solution Concept for Games with Information Asymmetry

Equilibrium Approximation Quality of Current No-Limit Poker Bots