Reducing overestimation in value mixing for cooperative deep multi-agent reinforcement learning

Z Fu,Q Zhao,W Zhang
2020-01-01
Abstract:Since the debut of Deep Q-Network (DQN), numerous researches have been conducted to integrate Deep Neural Networks (DNN) with Reinforcement Learning (RL). The tremendous expressive power of DNNs empowers Reinforcement Learning, which was mostly only functional in simple discrete / tabular settings, to solve complex problems in continuous and high-dimensional settings. Recently, Deep RL is adapted to Multi-Agent Systems (MAS). In many real-world scenarios, a group of agents, each with generally different local observations, needs to cooperate to achieve a collective reward. Despite decentralized execution, global state information can be shared among agents in a laboratory setting during the rehearsal period. We propose double QMIX, an end-to-end multi-agent Q-learning method with reduction of value overestimation, that trains decentralized agents’ policies in a centralized setting. The centralized Q-value is computed from each agent’s utility in a non-linear and anti-overestimated fashion. We provide the theoretical analysis of the reason why traditional DQN training methods lead to significant value overestimation in multi-agent setting, and how double QMIX solves this problem is explained. We also evaluate double QMIX in StarCraft II micromanagement environment to show a better performance, compared with other state-of-the-art value-based multi-agent reinforcement learning methods.
What problem does this paper attempt to address?