Sub-AVG: Overestimation reduction for cooperative multi-agent reinforcement learning

Haolin Wu,Jianwei Zhang,Zhuang Wang,Yi Lin,Hui Li
DOI: https://doi.org/10.1016/j.neucom.2021.12.039
IF: 6
2022-01-01
Neurocomputing
Abstract:Decomposing the centralized joint action value(JAV) into per-agent individual action value(IAV) is attractive in cooperative multi-agent reinforcement learning(MARL). In such tasks, IAVs based on local observation can perform decentralized policies, and the JAV is used for end-to-end training through traditional reinforcement learning methods, especially through the Q-learning algorithm. However, the Q-learning-based method suffers from overestimation, in which the overestimated action values may result in a suboptimal policy. In this paper, we show that such overestimation can occur in the above Q-learning-based decomposition method. Our solution is Sub-AVG, which utilizes a lower update target by discarding the larger of previously learned IAVs and averaging the retained ones, thus eliminating the excessive overestimation errors. Experiments in the StarCraft Multi-Agent Challenge(SMAC) environment show that Sub-AVG can lead to lower JAV estimations and better-performing policies.
What problem does this paper attempt to address?