Abstract:Model-based algorithms -- algorithms that explore the environment through building and utilizing an estimated model -- are widely used in reinforcement learning practice and theoretically shown to achieve optimal sample efficiency for single-agent reinforcement learning in Markov Decision Processes (MDPs). However, for multi-agent reinforcement learning in Markov games, the current best known sample complexity for model-based algorithms is rather suboptimal and compares unfavorably against recent model-free approaches. In this paper, we present a sharp analysis of model-based self-play algorithms for multi-agent Markov games. We design an algorithm -- Optimistic Nash Value Iteration (Nash-VI) for two-player zero-sum Markov games that is able to output an $\epsilon$-approximate Nash policy in $\tilde{\mathcal{O}}(H^3SAB/\epsilon^2)$ episodes of game playing, where $S$ is the number of states, $A,B$ are the number of actions for the two players respectively, and $H$ is the horizon length. This significantly improves over the best known model-based guarantee of $\tilde{\mathcal{O}}(H^4S^2AB/\epsilon^2)$, and is the first that matches the information-theoretic lower bound $\Omega(H^3S(A+B)/\epsilon^2)$ except for a $\min\{A,B\}$ factor. In addition, our guarantee compares favorably against the best known model-free algorithm if $\min \{A,B\}=o(H^3)$, and outputs a single Markov policy while existing sample-efficient model-free algorithms output a nested mixture of Markov policies that is in general non-Markov and rather inconvenient to store and execute. We further adapt our analysis to designing a provably efficient task-agnostic algorithm for zero-sum Markov games, and designing the first line of provably sample-efficient algorithms for multi-player general-sum Markov games.

Approximating Auction Equilibria with Reinforcement Learning

Equilibrium Learning in Combinatorial Auctions: Computing Approximate Bayesian Nash Equilibria via Pseudogradient Dynamics

Deep Reinforcement Learning for Strategic Bidding in Electricity Markets

Approximate Nash Equilibrium Learning for n-Player Markov Games in Dynamic Pricing

Computing Bayes Nash Equilibrium Strategies in Auction Games via Simultaneous Online Dual Averaging

Verifying Approximate Equilibrium in Auctions

Monte Carlo Neural Fictitious Self-Play: Achieve Approximate Nash equilibrium of Imperfect-Information Games.

Using Multi-Agent Reinforcement Learning in Auction Simulations

Equilibrium Computation in Multi-Stage Auctions and Contests

Learning to Bid Long-Term: Multi-Agent Reinforcement Learning with Long-Term and Sparse Reward in Repeated Auction Games

Deep Reinforcement Learning for Sequential Combinatorial Auctions

Learning Best Response Policies in Dynamic Auctions via Deep Reinforcement Learning

Efficient Competitive Self-Play Policy Optimization

A Sharp Analysis of Model-based Reinforcement Learning with Self-Play

Computational Performance of Deep Reinforcement Learning to find Nash Equilibria

A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

Using Reinforcement Learning to Validate Empirical Game-Theoretic Analysis: A Continuous Double Auction Study

Understanding Iterative Combinatorial Auction Designs via Multi-Agent Reinforcement Learning

Monte Carlo Neural Fictitious Self-Play: Approach to Approximate Nash equilibrium of Imperfect-Information Games

Self-Confirming Price Prediction Strategies for Simultaneous One-Shot Auctions