Expected Lenient Q-learning: a fast variant of the Lenient Q-learning algorithm for cooperative stochastic Markov games

Elmehdi Amhraoui,Tawfik Masrour
DOI: https://doi.org/10.1007/s13042-023-02063-6
2024-01-11
International Journal of Machine Learning and Cybernetics
Abstract:Lenient Multiagent Reinforcement Learning 2 (LMRL2) is an Independent Learners Algorithm for cooperative multiagent systems that is known to outperform other Independent Learners Algorithms in terms of convergence. However, the algorithm takes longer to converge. In this paper, we first present a new formulation of LMRL2, and then, based on this new formulation, we introduce Expected Lenient Q-learning Algorithm ( LQL). The new formulation demonstrates that LMRL2 performs the same update of Q-values as in standard Q-learning, but with a stochastic learning rate that follows a specified probability distribution. Based on this new formulation, LQL addresses the low speed and instabilities in LMRL2 by updating Q-values using a deterministic and evolving learning rate that equals the expected value of LMRL2 learning rate. We compared LQL with Decentralized Q-learning, Distributed Q-learning with and without coordination mechanism, Hysteretic Q-learning, and LMRL2. Our experiments on various test problems demonstrated that LQL is highly effective and surpasses all other algorithms in terms of convergence, especially in stochastic domains. Moreover, LQL outperforms LMRL2 in terms of convergence speed, which is why we regard LQL as a faster variant of LMRL2.
computer science, artificial intelligence
What problem does this paper attempt to address?