Selective Policy Transfer in Multi-Agent Systems with Sparse Interactions

Yunkai Zhuang,Yong Liu,Shangdong Yang,Yang Gao
DOI: https://doi.org/10.1016/j.knosys.2024.112031
IF: 8.139
2024-01-01
Knowledge-Based Systems
Abstract:Previously trained single-agent strategies, which are considerably easier to acquire than multi-agent strategies, are instructive for multi-agent reinforcement learning, especially when the interactions between agents are sparse. Traditional methods of knowledge transfer, from single-agent source tasks to multi-agent target tasks, typically rely on a pre-designed Markov-decision-process similarity function or metric function for evaluating the task similarity. In this study, we propose a selective policy transfer (SPOT) algorithm that eliminates the need for the manual crafting of a metric function to assess the similarity between source and target tasks. The SPOT algorithm enables agents to autonomously determine when and which policy to transfer using well-trained single-agent policies as options in the training process. We introduced a multi-agent policy-learning option in the option library, thus allowing the SPOT algorithm to leverage the transferred knowledge while concurrently learning new policies. The SPOT algorithm efficiently transfers sequential strategies in a few steps, thereby capturing high-level semantics. Experimental results obtained in both multi-agent arcade and multi-agent particle environments have demonstrated that the proposed algorithm outperforms the state-of-the-art methods in terms of jump-start and convergence speeds. Our evaluation indicators included the jump start, steps to threshold, cumulative reward, and asymptotic performance. Furthermore, visualizations of the strategies and termination functions of the agents aid in elucidating the operational principles of the SPOT algorithm.
What problem does this paper attempt to address?