Correcting Biased Value Estimation in Mixing Value-Based Multi-Agent Reinforcement Learning by Multiple Choice Learning.

Bing Liu,Yuxuan Xie,Lei Feng,Ping Fu
DOI: https://doi.org/10.1016/j.engappai.2022.105329
IF: 8
2022-01-01
Engineering Applications of Artificial Intelligence
Abstract:Multi-agent reinforcement learning (MARL) has become more and more popular over the past decades, and many value-based MARL methods are proposed in the past few years. Neural networks play important roles in these methods and are used to predict the value of the state–action pair, i.e. Q-value and actions of agents are chosen based on this. However the inaccurate prediction of the neural network leads to the biased Q-value estimation, which will cause inefficient usage of the experience data and poor performance. Unlike ensemble methods that just reduce the variance of predictions, multiple choice learning (MCL) methods exploit the cooperation among all the candidate models. This paper corrects the biased Q-value by exploiting the collaboration between the ensemble model and MARL to obtain a stabler and preciser Q-value estimator. In this paper, a new MARL method called Multiple Choice QMIX is developed to address the biased Q-value issue, which also extends the application scenarios of MCL methods. Specifically, we propose a voting network to learn the confidence level of each estimator and thus can provide the best prediction by combining their results. And a voting hindsight loss is proposed to encourage the voting network to overcome the overestimation of the Q-value. We also conduct experiments on four challenging tasks of the StarCraft II micromanagement benchmark. Experiment results show that our method obtains a faster convergence rate and stabler performance in multi-agent tasks.
What problem does this paper attempt to address?