Deep Factorized Q-Learning for Large Scale Multi-Agent Learning

Xiaoqiang Wang,Liangjun Ke
DOI: https://doi.org/10.1109/cei57409.2022.9950106
2022-01-01
Abstract:The value function decomposition is an effective way to alleviate the curse of dimension in Multi-Agent Reinforcement Learning (MARL). However, the existing methods usually either can only provide the low-order approximate decomposition of no more than the second-order, or need to spend a lot of effort to manually design the high-order interaction among agents according to experience. Therefore, the existing methods either tend to bear large decomposition error or are not convenient to use. In this paper, a high-order approximate value function decomposition method is proposed, which can be trained end-to-end. There have some prominent features about this method including low-rank vector exploited to represent value function, both low- and high-order component sharing the same input (i.e., the embedding vector), the model parameters shared among all the agents if they are homogeneous. Experimental results show that our method is effective.
What problem does this paper attempt to address?