Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning

Peihong Yu,Manav Mishra,Alec Koppel,Carl Busart,Priya Narayan,Dinesh Manocha,Amrit Bedi,Pratap Tokekar
2024-03-14
Abstract:Multi-Agent Reinforcement Learning (MARL) algorithms face the challenge of efficient exploration due to the exponential increase in the size of the joint state-action space. While demonstration-guided learning has proven beneficial in single-agent settings, its direct applicability to MARL is hindered by the practical difficulty of obtaining joint expert demonstrations. In this work, we introduce a novel concept of personalized expert demonstrations, tailored for each individual agent or, more broadly, each individual type of agent within a heterogeneous team. These demonstrations solely pertain to single-agent behaviors and how each agent can achieve personal goals without encompassing any cooperative elements, thus naively imitating them will not achieve cooperation due to potential conflicts. To this end, we propose an approach that selectively utilizes personalized expert demonstrations as guidance and allows agents to learn to cooperate, namely personalized expert-guided MARL (PegMARL). This algorithm utilizes two discriminators: the first provides incentives based on the alignment of policy behavior with demonstrations, and the second regulates incentives based on whether the behavior leads to the desired objective. We evaluate PegMARL using personalized demonstrations in both discrete and continuous environments. The results demonstrate that PegMARL learns near-optimal policies even when provided with suboptimal demonstrations, and outperforms state-of-the-art MARL algorithms in solving coordinated tasks. We also showcase PegMARL's capability to leverage joint demonstrations in the StarCraft scenario and converge effectively even with demonstrations from non-co-trained policies.
Multiagent Systems,Artificial Intelligence,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively utilize personalized expert demonstrations to guide learning in multi - agent reinforcement learning (MARL). Specifically, the paper focuses on how to promote the efficient learning of multi - agent systems through personalized single - agent behavior demonstrations of each agent or each type of agent without joint expert demonstrations. Since obtaining joint demonstrations usually requires a great deal of labor and resources and needs to be recollected when the number or type of agents changes, this method has great limitations in practical applications. Therefore, the paper proposes a new method - Personalized Expert - Guided MARL (PegMARL), aiming to improve the learning efficiency and effectiveness of multi - agent systems by using personalized single - agent behavior demonstrations. The core idea of PegMARL is to dynamically reshape the original rewards through two discriminators to assist exploration. The first discriminator provides incentives according to the consistency between policy behaviors and demonstrations, and the second discriminator adjusts the incentive weights based on whether the behaviors lead to the desired goals. This method can not only use sub - optimal personalized demonstrations to learn near - optimal policies, but also outperforms existing state - of - the - art MARL algorithms in solving coordination tasks. In addition, PegMARL can also effectively use joint demonstrations generated by non - collaborative training policies to learn in complex environments such as StarCraft. The main contributions of the paper include: 1. Proposing the first method that can use personalized demonstrations for policy learning in heterogeneous multi - agent environments regardless of the number and type of agents. 2. Designing the PegMARL algorithm, which can dynamically and selectively reshape the original rewards to assist exploration, and this algorithm is highly versatile and can be compatible with most MARL policy gradient methods. 3. Verifying the effectiveness of PegMARL through experiments in discrete grid worlds and continuous multi - agent particle environments, proving that it is superior to other state - of - the - art decentralized MARL algorithms, pure multi - agent imitation learning, and reward shaping techniques in terms of scalability and convergence speed. 4. Demonstrating the strong ability of PegMARL in using joint demonstrations, which can effectively converge even if these demonstrations come from non - collaborative training policies.