Credit-of-Q-value for Multi-Agent Reinforcement Learning

Shuaibin Li,Xiu Li,Jinqiang Cui
DOI: https://doi.org/10.23919/ccc55666.2022.9902237
2022-01-01
Abstract:Recently, multi-agent reinforcement learning algorithms based on value function factorization have emerged to address the collaborative settings in multi-agent systems. These methods usually fit a joint Q-value function Qtot, which is the ground truth of the whole system and used for centralized training. Qtot can be decomposed into a set of local performance functions Qis, which have no specific meaning and are used as a reference for agents to take actions in a distributed manner. Inspired by the Age of Information concept and multi-scale continuous learning process in Biological Reinforcement Learning, this paper proposes an innovative concept Credit-of-Q-value (CoQ), which means how confident we can take actions based on the current Q-value function. Based on this concept, we propose a corresponding algorithm. We evaluate our method on a challenging set of StarCraft II tasks, and the results show that CoQ significantly improves state-of-the-art value-based multi-agent reinforcement learning methods.
What problem does this paper attempt to address?