Q-learning with biased policy rules

Olivier Compte
DOI: https://doi.org/10.48550/arXiv.2304.12647
2023-10-20
Abstract:In dynamic environments, Q-learning is an automaton that (i) provides estimates (Q-values) of the continuation values associated with each available action; and (ii) follows the naive policy of almost always choosing the action with highest Q-value. We consider a family of automata that are based on Q-values but whose policy may systematically favor some actions over others, for example through a bias that favors cooperation. In the spirit of Compte and Postlewaite [2018], we look for equilibrium biases within this family of Q-based automata. We examine classic games under various monitoring technologies and find that equilibrium biases may strongly foster collusion.
Theoretical Economics,Artificial Intelligence,Computer Science and Game Theory
What problem does this paper attempt to address?