Abstract:Using simulations between pairs of $\epsilon$-greedy q-learners with one-period memory, this article demonstrates that the potential function of the stochastic replicator dynamics (Foster and Young, 1990) allows it to predict the emergence of error-proof cooperative strategies from the underlying parameters of the repeated prisoner's dilemma. The observed cooperation rates between q-learners are related to the ratio between the kinetic energy exerted by the polar attractors of the replicator dynamics under the grim trigger strategy. The frontier separating the parameter space conducive to cooperation from the parameter space dominated by defection can be found by setting the kinetic energy ratio equal to a critical value, which is a function of the discount factor, $f(\delta) = \delta/(1-\delta)$, multiplied by a correction term to account for the effect of the algorithms' exploration probability. The gradient at the frontier increases with the distance between the game parameters and the hyperplane that characterizes the incentive compatibility constraint for cooperation under grim trigger. Building on literature from the neurosciences, which suggests that reinforcement learning is useful to understanding human behavior in risky environments, the article further explores the extent to which the frontier derived for q-learners also explains the emergence of cooperation between humans. Using metadata from laboratory experiments that analyze human choices in the infinitely repeated prisoner's dilemma, the cooperation rates between humans are compared to those observed between q-learners under similar conditions. The correlation coefficients between the cooperation rates observed for humans and those observed for q-learners are consistently above $0.8$. The frontier derived from the simulations between q-learners is also found to predict the emergence of cooperation between humans.

Reinforcement learning explains various conditional cooperation

Cooperation and Charity in Spatial Public Goods Game under Different Strategy Update Rules

Interaction state Q-learning promotes cooperation in the spatial prisoner's dilemma game

Oscillatory evolution of collective behavior in evolutionary games played with reinforcement learning

Evolution of cooperation in the public goods game with Q-learning

A study of factors in the formation of population game cooperation based on mixed learning rules

Intrinsic fluctuations of reinforcement learning promote cooperation

Win-stay-lose-learn Promotes Cooperation in the Spatial Prisoner's Dilemma Game.

Enhancing Cooperation through Selective Interaction and Long-term Experiences in Multi-Agent Reinforcement Learning

On the Emergence of Cooperation in the Repeated Prisoner's Dilemma

Evolutionary cooperation dynamics of combining imitation and super-rational aspiration induced strategy updating

Lévy noise promotes cooperation in the prisoner’s dilemma game with reinforcement learning

Cooperation in evolutionary games incorporated with extended Q-learning algorithm

Emergence of cooperation under punishment: A reinforcement learning perspective

Emergence of cooperation in two-agent repeated games with reinforcement learning

Cooperation in Public Goods Games: Leveraging Other-Regarding Reinforcement Learning on Hypergraphs

Catalytic evolution of cooperation in a population with behavioural bimodality

Cautious strategy update promotes cooperation in spatial prisoner’s dilemma game

Two-stage strategy update rule based on learning cost in weak prisoner’s dilemma

Improved cooperation by balancing exploration and exploitation in intertemporal social dilemma tasks

Evolutionary game dynamics of combining two different aspiration-driven update rules in structured populations