MA2CL:Masked Attentive Contrastive Learning for Multi-Agent Reinforcement Learning

Haolin Song,Mingxiao Feng,Wengang Zhou,Houqiang Li
2023-06-03
Abstract:Recent approaches have utilized self-supervised auxiliary tasks as representation learning to improve the performance and sample efficiency of vision-based reinforcement learning algorithms in single-agent settings. However, in multi-agent reinforcement learning (MARL), these techniques face challenges because each agent only receives partial observation from an environment influenced by others, resulting in correlated observations in the agent dimension. So it is necessary to consider agent-level information in representation learning for MARL. In this paper, we propose an effective framework called \textbf{M}ulti-\textbf{A}gent \textbf{M}asked \textbf{A}ttentive \textbf{C}ontrastive \textbf{L}earning (MA2CL), which encourages learning representation to be both temporal and agent-level predictive by reconstructing the masked agent observation in latent space. Specifically, we use an attention reconstruction model for recovering and the model is trained via contrastive learning. MA2CL allows better utilization of contextual information at the agent level, facilitating the training of MARL agents for cooperation tasks. Extensive experiments demonstrate that our method significantly improves the performance and sample efficiency of different MARL algorithms and outperforms other methods in various vision-based and state-based scenarios. Our code can be found in \url{<a class="link-external link-https" href="https://github.com/ustchlsong/MA2CL" rel="external noopener nofollow">this https URL</a>}
Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve the representation learning problem in multi - agent reinforcement learning (MARL), especially how to effectively utilize temporal information and agent - level information in multi - agent environments. Traditional self - supervised learning methods have proven their effectiveness in single - agent scenarios, but in multi - agent scenarios, since each agent can only receive partial environmental observations, and these observations are affected by the behaviors of other agents, they are faced with challenges. These problems lead to the correlation of observations among agents, making it difficult to effectively construct a dynamic model that only relies on temporal information. Moreover, in cooperative tasks, all agents need to collaborate to achieve a common goal, so it becomes especially important to consider agent - level information when making decisions. To solve the above problems, the paper proposes a new framework named Multi - Agent Masked Attentive Contrastive Learning (MA2CL). MA2CL encourages the learning of representations that are both temporally predictive and agent - level predictive by reconstructing the masked agent observations in the latent space. Specifically, MA2CL uses an attention reconstruction model to recover the masked agent observations and is trained through contrastive learning. This method allows for better utilization of agent - level contextual information, thereby promoting the training of multi - agent reinforcement learning algorithms for cooperative tasks. The main contributions of MA2CL include: 1. Proposing MA2CL, an attention - contrastive representation learning framework suitable for multi - agent reinforcement learning algorithms, aiming to promote agents to learn effective representations. 2. Implementing the application of MA2CL on MAT and MAPPO, demonstrating its flexibility and the ability to be integrated into various offline multi - agent reinforcement learning algorithms. 3. Through extensive experimental verification, MA2CL shows a state superior to previous methods in both vision - based and state - based multi - agent environments and achieves state - of - the - art performance. In conclusion, MA2CL solves the problem of low sample efficiency in multi - agent reinforcement learning by introducing agent - level information as a learning objective, especially in cases where agents need to make decisions based on incomplete information.