Abstract:Abstract Emerging reinforcement learning algorithms that utilize human traits as part of their conceptual architecture have been demonstrated to encourage cooperation in social dilemmas when compared to their unaltered origins. In particular, the addition of a mood mechanism facilitates more cooperative behaviour in multi-agent iterated prisoner dilemma (IPD) games, for both static and dynamic network contexts. Mood-altered agents also exhibit humanlike behavioural trends when environmental aspects of the dilemma are altered, such as the structure of the payoff matrix used. It is possible that other environmental effects from both human and agent-based research will interact with moody structures in previously unstudied ways. As the literature on these interactions is currently small, we seek to expand on previous research by introducing two more environmental dimensions; voluntary interaction in dynamic networks, and stability of interaction through varied network restructuring. From an initial Erdos–Renyi random network, we manipulate the structure of a network IPD according to existing methodology in human-based research, to investigate possible replication of their findings. We also facilitated strategic selection of opponents through the introduction of two partner evaluation mechanisms and tested two selection thresholds for each. We found that even minimally strategic play termination in dynamic networks is enough to enhance cooperation above a static level, though the thresholds for these strategic decisions are critical to desired outcomes. More forgiving thresholds lead to better maintenance of cooperation between kinder strategies than stricter ones, despite overall cooperation levels being relatively low. Additionally, moody reinforcement learning combined with certain play termination decision strategies can mimic trends in human cooperation affected by structural changes to the IPD played on dynamic networks—as can kind and simplistic strategies such as Tit-For-Tat. Implications of this in comparison with human data is discussed, and suggestions for diversification of further testing are made.

A Cooperative Reinforcement Learning Environment for Detecting and Penalizing Betrayal

Intrinsic fluctuations of reinforcement learning promote cooperation

Cooperation and Reputation Dynamics with Reinforcement Learning

Incorporating Rivalry in Reinforcement Learning for a Competitive Game

Consequentialist conditional cooperation in social dilemmas with imperfect information

Emergence of Cooperation in Two-agent Repeated Games with Reinforcement Learning

Hidden Agenda: a Social Deduction Game with Diverse Learned Equilibria

Emergent Resource Exchange and Tolerated Theft Behavior using Multi-Agent Reinforcement Learning

Towards Cooperation in Sequential Prisoner's Dilemmas: a Deep Multiagent Reinforcement Learning Approach

Enhancing Cooperation through Selective Interaction and Long-term Experiences in Multi-Agent Reinforcement Learning

Emergent Resource Exchange and Tolerated Theft Behavior Using Multiagent Reinforcement Learning

Learning multiagent coordination in the absence of communication channels

A multi-agent reinforcement learning model of reputation and cooperation in human groups

Birds of a Feather Flock Together: A Close Look at Cooperation Emergence via Multi-Agent RL

‘ I don’t want to play with you anymore ’: dynamic partner judgements in moody reinforcement learners playing the prisoner’s dilemma

Honesty Is the Best Policy: Defining and Mitigating AI Deception

Learning Not to Spoof

Deconstructing Cooperation and Ostracism via Multi-Agent Reinforcement Learning

Using deep reinforcement learning to promote sustainable human behaviour on a common pool resource problem

The State-Action-Reward-State-Action Algorithm in Spatial Prisoner's Dilemma Game