Multi-Agent Transfer Learning via Temporal Contrastive Learning

Weihao Zeng,Joseph Campbell,Simon Stepputtis,Katia Sycara
2024-06-03
Abstract:This paper introduces a novel transfer learning framework for deep multi-agent reinforcement learning. The approach automatically combines goal-conditioned policies with temporal contrastive learning to discover meaningful sub-goals. The approach involves pre-training a goal-conditioned agent, finetuning it on the target domain, and using contrastive learning to construct a planning graph that guides the agent via sub-goals. Experiments on multi-agent coordination Overcooked tasks demonstrate improved sample efficiency, the ability to solve sparse-reward and long-horizon problems, and enhanced interpretability compared to baselines. The results highlight the effectiveness of integrating goal-conditioned policies with unsupervised temporal abstraction learning for complex multi-agent transfer learning. Compared to state-of-the-art baselines, our method achieves the same or better performances while requiring only 21.7% of the training samples.
Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the sample efficiency of transfer learning (TL) in multi - agent reinforcement learning (MARL), especially when dealing with sparse - reward and long - horizon problems. Specifically, the authors propose a new transfer learning framework. By combining goal - conditioned policies with temporal contrastive learning, it aims to discover meaningful sub - goals, thereby improving the learning efficiency and performance on new tasks. The main contributions of the paper are as follows: 1. Propose a novel transfer learning method that enables agents to efficiently learn new tasks while leveraging previous experience. 2. Combine goal - conditioned policies with unsupervised temporal abstraction learning to improve sample efficiency and adaptability. Verified by experiments in the multi - agent collaborative environment "Overcooked", this method is not only significantly superior to existing baseline methods in sample efficiency, but also performs excellently in solving sparse - reward and long - horizon problems, while enhancing the interpretability of the learning process.