Abstract:This cumulative dissertation contains four self-contained chapters on stochastic games and learning in intertemporal choice. Chapter 1 presents an experiment on value learning in a setting where actions have both immediate and delayed consequences. Subjects make a series of choices between abstract options, with values that have to be learned by sampling. Each option is associated with two payoff components: One is revealed immediately after the choice, the other with one round delay. Objectively, both payoff components are equally important, but most subjects systematically underreact to the delayed consequences. The resulting behavior appears impatient or myopic. However, there is no inherent reason to discount: All rewards are paid simultaneously, after the experiment. Elicited beliefs on the value of options are in accordance with choice behavior. These results demonstrate that revealed impatience may arise from frictions in learning, and that discounting does not necessarily reflect deep time preferences. In a treatment variation, subjects first learn passively from the evidence generated by others, before then making a series of own choices. Here, the underweighting of delayed consequences is attenuated, in particular for the earliest own decisions. Active decision making thus seems to play an important role in the emergence of the observed bias. Chapter 2 introduces and proves existence of Markov quantal response equilibrium (QRE), an application of QRE to finite discounted stochastic games. We then study a specific case, logit Markov QRE, which arises when players react to total discounted payoffs using the logit choice rule with precision parameter λ. We show that the set of logit Markov QRE always contains a smooth path that leads from the unique QRE at λ = 0 to a stationary equilibrium of the game as λ goes to infinity. Following this path allows to solve arbitrary finite discounted stochastic games numerically; an implementation of this algorithm is publicly available as part of the package sgamesolver. We further show that all logit Markov QRE are ε-equilibria, with a bound for ε that is independent of the payoff function of the game and decreases hyperbolically in λ. Finally, we establish a link to reinforcement learning, by characterizing logit Markov QRE as the stationary points of a game dynamic that arises when all players follow the well-established reinforcement learning algorithm expected SARSA. Chapter 3 introduces the logarithmic stochastic tracing procedure, a homotopy method to compute stationary equilibria for finite and discounted stochastic games. We build on the linear stochastic tracing procedure (Herings and Peeters 2004), but introduce logarithmic penalty terms as a regularization device, which brings two major improvements. First, the scope of the method is extended: it now has a convergence guarantee for all games of this class, rather than just generic ones. Second, by ensuring a smooth and interior solution path, computational performance is increased significantly. A ready-to-use implementation is publicly available. As demonstrated here, its speed compares quite favorable to other available algorithms, and it allows to solve games of considerable size in reasonable times. Because the method involves the gradual transformation of a prior into equilibrium strategies, it is possible to search the prior space and uncover potentially multiple equilibria and their respective basins of attraction. This also connects the method to established theory of equilibrium selection. Chapter 4 introduces sgamesolver, a python package that uses the homotopy method to compute stationary equilibria of finite discounted stochastic games. A short user guide is complemented with discussion of the homotopy method, the two implemented homotopy functions logit Markov QRE and logarithmic tracing, and the predictor-corrector procedure and its implementation in sgamesolver. Basic and advanced use cases are demonstrated using several example games. Finally, we discuss the topic of symmetries in stochastic games.

Unlearnable Games and "Satisficing'' Decisions: A Simple Model for a Complex World

Unlearnable Games and "Satisficing" Decisions: A Simple Model for a Complex World

Strategic Teaching and Learning in Games

Learning in Games: Neural Computations Underlying Strategic Learning

Cyclic game dynamics driven by iterated reasoning

Cycles of cooperation and defection in imperfect learning

The equivalence of dynamic and strategic stability under regularized learning in games

Endogenous Barriers to Learning

Learning in Multi-Objective Public Goods Games with Non-Linear Utilities

Catastrophe by Design in Population Games: Destabilizing Wasteful Locked-in Technologies

Learning Probably Approximately Correct Maximin Strategies in Simulation-Based Games with Infinite Strategy Spaces

Ignorance is Bliss: A Game of Regret

Learning in Markets: Greed Leads to Chaos but Following the Price is Right

Essays on stochastic games and learning in intertemporal choice

Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games

Learning in a complex world: Insights from an OLG lab experiment

Social Optimum Equilibrium Selection for Distributed Multi-Agent Optimization

Evolutionary Game Theory Squared: Evolving Agents in Endogenously Evolving Zero-Sum Games

Testing Models of Strategic Uncertainty: Equilibrium Selection in Repeated Games

Penalty-Regulated Dynamics and Robust Learning Procedures in Games

When Does Learning in Games Generate Convergence to Nash Equilibria? the Role of Supermodularity in an Experimental Setting ⁄