Abstract:Using simulations between pairs of $\epsilon$-greedy q-learners with one-period memory, this article demonstrates that the potential function of the stochastic replicator dynamics (Foster and Young, 1990) allows it to predict the emergence of error-proof cooperative strategies from the underlying parameters of the repeated prisoner's dilemma. The observed cooperation rates between q-learners are related to the ratio between the kinetic energy exerted by the polar attractors of the replicator dynamics under the grim trigger strategy. The frontier separating the parameter space conducive to cooperation from the parameter space dominated by defection can be found by setting the kinetic energy ratio equal to a critical value, which is a function of the discount factor, $f(\delta) = \delta/(1-\delta)$, multiplied by a correction term to account for the effect of the algorithms' exploration probability. The gradient at the frontier increases with the distance between the game parameters and the hyperplane that characterizes the incentive compatibility constraint for cooperation under grim trigger. Building on literature from the neurosciences, which suggests that reinforcement learning is useful to understanding human behavior in risky environments, the article further explores the extent to which the frontier derived for q-learners also explains the emergence of cooperation between humans. Using metadata from laboratory experiments that analyze human choices in the infinitely repeated prisoner's dilemma, the cooperation rates between humans are compared to those observed between q-learners under similar conditions. The correlation coefficients between the cooperation rates observed for humans and those observed for q-learners are consistently above $0.8$. The frontier derived from the simulations between q-learners is also found to predict the emergence of cooperation between humans.

Symmetric equilibrium of multi-agent reinforcement learning in repeated prisoner's dilemma

Memory-two strategies forming symmetric mutual reinforcement learning equilibrium in repeated prisoners' dilemma game

Adaptive algorithm for multi-agent learning optimal cooperative pursuit strategy based on Markov game

A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

Win-Stay-Lose-Shift as a self-confirming equilibrium in the iterated Prisoner's Dilemma

Towards Cooperation in Sequential Prisoner's Dilemmas: a Deep Multiagent Reinforcement Learning Approach

Multi-agent Reinforcement Learning in Sequential Social Dilemmas

A case control study to assess risk factors for hepatitis C among a general population in a highly endemic area of northwest Tunisia.

Learning multiagent coordination in the absence of communication channels

Win-Stay-Lose-Shift as a self-confirming equilibrium in the iterated Prisoner’s Dilemma

Exploring Dominant Strategies in Iterated and Evolutionary Games: a Multi-Agent Reinforcement Learning Approach

Memory-two zero-determinant strategies in repeated games

Learning in Multi-Memory Games Triggers Complex Dynamics Diverging from Nash Equilibrium

Emergence of cooperation in two-agent repeated games with reinforcement learning

Reinforcement Learning Produces Dominant Strategies for the Iterated Prisoner's Dilemma

Social Optimum Equilibrium Selection for Distributed Multi-Agent Optimization

Equilibrium Selection for Multi-agent Reinforcement Learning: A Unified Framework

On the Emergence of Cooperation in the Repeated Prisoner's Dilemma

A Risk-Averse Equilibrium for Multi-Agent Systems

Taming Equilibrium Bias in Risk-Sensitive Multi-Agent Reinforcement Learning

The Iterated Prisoner's Dilemma: Good Strategies and Their Dynamics