Abstract:Regret minimization has proved to be a versatile tool for tree-form sequential decision making and extensive-form games. In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium. Most regret-minimization algorithms for tree-form sequential decision making, including CFR, require (i) an exact model of the player's decision nodes, observation nodes, and how they are linked, and (ii) full knowledge, at all times t, about the payoffs -- even in parts of the decision space that are not encountered at time t. Recently, there has been growing interest towards relaxing some of those restrictions and making regret minimization applicable to settings for which reinforcement learning methods have traditionally been used -- for example, those in which only black-box access to the environment is available. We give the first, to our knowledge, regret-minimization algorithm that guarantees sublinear regret with high probability even when requirement (i) -- and thus also (ii) -- is dropped. We formalize an online learning setting in which the strategy space is not known to the agent and gets revealed incrementally whenever the agent encounters new decision points. We give an efficient algorithm that achieves $O(T^{3/4})$ regret with high probability for that setting, even when the agent faces an adversarial environment. Our experiments show it significantly outperforms the prior algorithms for the problem, which do not have such guarantees. It can be used in any application for which regret minimization is useful: approximating Nash equilibrium or quantal response equilibrium, approximating coarse correlated equilibrium in multi-player games, learning a best response, learning safe opponent exploitation, and online play against an unknown opponent/environment.

Online Immediate Orientation in Monte Carlo Counterfactual Regret Minimization for Simultaneous Games

A Fast-Convergence Method of Monte Carlo Counterfactual Regret Minimization for Imperfect Information Dynamic Games

Monte Carlo Continual Resolving for Online Strategy Computation in Imperfect Information Games

Accelerating Nash Equilibrium Convergence in Monte Carlo Settings Through Counterfactual Value Based Fictitious Play

Solving Imperfect-Information Games Via Exponential Counterfactual Regret Minimization

Double Neural Counterfactual Regret Minimization.

Combining Counterfactual Regret Minimization with Information Gain to Solve Extensive Games with Unknown Environments

Efficient CFR for Imperfect Information Games with Instant Updates

Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent

Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games

Scalable sub-game solving for imperfect-information games

Online Convex Optimization with Continuous Switching Constraint

RM-FSP: Regret Minimization Optimizes Neural Fictitious Self-Play

Monte Carlo Neural Fictitious Self-Play: Achieve Approximate Nash equilibrium of Imperfect-Information Games.

GPU-Accelerated Counterfactual Regret Minimization

CFR-p: Counterfactual Regret Minimization with Hierarchical Policy Abstraction, and its Application to Two-player Mahjong

Equivalence Analysis between Counterfactual Regret Minimization and Online Mirror Descent

Robust No-Regret Learning in Min-Max Stackelberg Games

Faster Optimistic Online Mirror Descent for Extensive-Form Games

Solving Poker Games Efficiently: Adaptive Memory Based Deep Counterfactual Regret Minimization

Combining Counterfactual Regret Minimization with Information Gain to Solve Extensive Games with Imperfect Information