TLMIX: Twin Leader Mixing Network for Cooperative Multi-Agent Reinforcement Learning.

Yu Zhang,Pengyu Gao,Zhe Wu,Yusheng Jiang,Junliang Xing,Pin Tao
DOI: https://doi.org/10.1109/ijcnn54540.2023.10191504
2023-01-01
Abstract:Recent methods of cooperative multiagent rein-forcement learning built upon the individual global max value decomposition principle show promising results using variants of deep mixing networks, and credit assignment plays a crucial role in it. However, each agent in a multiagent system requires not only credit assignment but also credit feedback which tells each agent how many rewards it should obtain to maximize expected cumulative global rewards. In this work, we propose TLMIX, a novel Twin Leader Mixing Network for multiagent cooperation reinforcement learning while maintaining the centralized training and decentralized execution paradigm. TLMIX introduces a leader network to address the credit feedback issue by utilizing global information to provide reasonable objectives for agent networks. TLMIX also introduces a twin mixing network to find a more accurate target function from the Q-value functions, which avoids the rapid increase in parameter scale caused by introducing individual agents' twin networks and effectively mitigates the accumulation of high overestimation errors caused by temporal difference updates. Extensive results on SMAC experimental scenarios and the Predator-Prey environment demonstrate that TLMIX significantly outperforms comparable benchmark algorithms on convergence speed and performance.
What problem does this paper attempt to address?