Continual Depth-limited Responses for Computing Counter-strategies in Sequential Games

David Milec,Ondřej Kubíček,Viliam Lisý
2024-04-03
Abstract:In zero-sum games, the optimal strategy is well-defined by the Nash equilibrium. However, it is overly conservative when playing against suboptimal opponents and it can not exploit their weaknesses. Limited look-ahead game solving in imperfect-information games allows defeating human experts in massive real-world games such as Poker, Liar's Dice, and Scotland Yard. However, since they approximate Nash equilibrium, they tend to only win slightly against weak opponents. We propose methods combining limited look-ahead solving with an opponent model in order to 1) approximate a best response in large games or 2) compute a robust response with control over the robustness of the response. Both methods can compute the response in real time to previously unseen strategies. We present theoretical guarantees of our methods. We show that existing robust response methods do not work combined with limited look-ahead solving of the shelf, and we propose a novel solution for the issue. Our algorithm performs significantly better than multiple baselines in smaller games and outperforms state-of-the-art methods against SlumBot.
Computer Science and Game Theory
What problem does this paper attempt to address?
The paper primarily addresses the problem of designing algorithms to target suboptimal opponents in imperfect information games and proposes a theoretically guaranteed method. Specifically, the paper focuses on how to formulate strategies to exploit the weaknesses of suboptimal opponents in zero-sum games. Traditional Nash equilibrium strategies are too conservative and cannot effectively exploit the opponent's deficiencies. To solve this problem, the paper proposes several methods: 1. **Continual Depth-Bounded Best Response (CDBR)**: This is a method that combines depth-bounded search with opponent modeling to approximate the best response in large games. This method allows players to compute responses to previously unseen strategies in real-time. 2. **Robust Response**: When there is uncertainty in the opponent model, robust response can control the robustness of the response. The paper discusses two methods of robust response: - **CDBR-NE**: This is a linear combination of CDBR and Nash equilibrium strategies, which can be used to limit the exploitability of the strategy. - **Continual Depth-Bounded Restricted Nash Response (CDRNR)**: This method better balances the trade-off between payoff and exploitability and allows for safe control of parameters. 3. **Addressing Issues in Depth-Bounded Search**: To ensure the theoretical correctness of the above methods, the paper also explores issues that arise when using opponent models in depth-bounded search and proposes a solution called "full gadget." This solution maintains the path from previously solved subgames to the root node and uses a value function for evaluation when leaving these solved parts. Through the above methods, the paper not only provides effective strategies for targeting suboptimal opponents but also demonstrates that the proposed algorithms are theoretically sound and experimentally validated. The algorithms show significant advantages, especially in small games like poker and against specific AI systems such as SlumBot.