Abstract:Recent advances in multi-agent reinforcement learning (MARL) allow agents to coordinate their behaviors in complex environments. However, common MARL algorithms still suffer from scalability and sparse reward issues. One promising approach to resolving them is automatic curriculum learning (ACL). ACL involves a student (curriculum learner) training on tasks of increasing difficulty controlled by a teacher (curriculum generator). Despite its success, ACL's applicability is limited by (1) the lack of a general student framework for dealing with the varying number of agents across tasks and the sparse reward problem, and (2) the non-stationarity of the teacher's task due to ever-changing student strategies. As a remedy for ACL, we introduce a novel automatic curriculum learning framework, Skilled Population Curriculum (SPC), which adapts curriculum learning to multi-agent coordination. Specifically, we endow the student with population-invariant communication and a hierarchical skill set, allowing it to learn cooperation and behavior skills from distinct tasks with varying numbers of agents. In addition, we model the teacher as a contextual bandit conditioned by student policies, enabling a team of agents to change its size while still retaining previously acquired skills. We also analyze the inherent non-stationarity of this multi-agent automatic curriculum teaching problem and provide a corresponding regret bound. Empirical results show that our method improves the performance, scalability and sample efficiency in several MARL environments.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively improve the policy learning effect of large - scale multi - agent systems in sparse - reward environments in Multi - Agent Reinforcement Learning (MARL). Specifically, the paper points out that current MARL algorithms face two main challenges when dealing with large - scale multi - agent systems: 1. **Scalability and Sparse - Reward Problem**: As the number of agents increases, the joint observation - action space grows exponentially, which makes it difficult to learn effective policies. In addition, sparse reward signals require a large number of training trajectories, which poses an obstacle when applying existing MARL algorithms in complex environments. 2. **Limitations of Automatic Curriculum Learning (ACL)**: Although ACL helps agents learn by gradually increasing task difficulty, its applicability is limited as follows: - There is a lack of a general student framework to handle the variation in the number of agents in different tasks and the sparse - reward problem. - The task of the teacher is non - stationary because the students' policies are constantly changing. To solve these problems, the paper proposes a new automatic curriculum learning framework - Skilled Population Curriculum (SPC), which aims to adapt to multi - agent coordinated learning. The main contributions of SPC include: - **Population - Invariant Communication**: The student module is endowed with population - invariant communication capabilities and can handle the variation in the number of agents in different tasks. - **Hierarchical Skill Set**: The student module also has a hierarchical skill set and can learn cooperation and behavioral skills from different tasks. - **Contextual Multi - Armed Bandit Teacher**: The teacher is modeled as a contextual multi - armed bandit based on the students' policies and can retain previously acquired skills while the team size changes. Through these designs, SPC aims to improve the performance, scalability, and sample efficiency of multi - agent systems, especially in sparse - reward environments. Experimental results show that SPC exhibits superior performance in multiple MARL environments.

Towards Skilled Population Curriculum for Multi-Agent Reinforcement Learning

S2rl

Multiagent Reinforcement Learning for Strictly Constrained Tasks Based on Reward Recorder

S2RL: Do We Really Need to Perceive All States in Deep Multi-Agent Reinforcement Learning?

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

Accelerate Multi-Agent Reinforcement Learning in Zero-Sum Games with Subgame Curriculum Learning

Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems

Neural Auto-Curricula

Curriculum Learning for Cooperation in Multi-Agent Reinforcement Learning

Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?

Skill matters: Dynamic skill learning for multi-agent cooperative reinforcement learning

Learning Curriculum Policies for Reinforcement Learning

LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent Reinforcement Learning

Confidence-Based Curriculum Learning for Multi-Agent Path Finding

SC-MAIRL: Semi-Centralized Multi-Agent Imitation Reinforcement Learning

Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill Discovery

Automatic Curriculum Learning For Deep RL: A Short Survey

Hierarchical Deep Multiagent Reinforcement Learning with Temporal Abstraction

MARL-LNS: Cooperative Multi-agent Reinforcement Learning via Large Neighborhoods Search

CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models

MANSA: Learning Fast and Slow in Multi-Agent Systems