Q-SAT: Value Factorization with Self-Attention for Deep Multi-Agent Reinforcement Learning

Xunhan Hu,Jian Zhao,Youpeng Zhao,Wengang Zhou,Houqiang Li
DOI: https://doi.org/10.1109/IJCNN54540.2023.10191777
2023-01-01
Abstract:In many real-world tasks, a team of agents learn to cooperate with each other under the setting of partial observability and communication constraints, where value factorization has been demonstrated as an effective solution. In a multi-agent system, it's important to capture the inter-connection between agents and push agents to consider more of the relevant teammates. Motivated by the success of self-attention in natural language processing and computer vision, we propose a novel value factorization mechanism, called Q-function Self ATtention (Q-SAT). It models the pairwise action-value functions and connection coefficient between agent pairs explicitly, and pays more attention to the interrelated agents when making decisions. Satisfying the IGM principle, Q-SAT introduces the self-attention into value factorization network. This attention mechanism enables more effective and efficient learning in complex multi-agent environments. Q-SAT can be viewed as a basic building block and is ready to be applied to existing value factorization methods. The experimental results show that Q-SAT captures the connection relationship between agents and significantly improves the learning performance on the challenging StarCraft II micromanagement task and Google Research Football task.
What problem does this paper attempt to address?