Abstract:Multiagent reinforcement learning (MARL) is commonly considered to suffer from non-stationary environments and exponentially increasing policy space. It would be even more challenging when rewards are sparse and delayed over long trajectories. In this paper, we study hierarchical deep MARL in cooperative multiagent problems with sparse and delayed reward. With temporal abstraction, we decompose the problem into a hierarchy of different time scales and investigate how agents can learn high-level coordination based on the independent skills learned at the low level. Three hierarchical deep MARL architectures are proposed to learn hierarchical policies under different MARL paradigms. Besides, we propose a new experience replay mechanism to alleviate the issue of the sparse transitions at the high level of abstraction and the non-stationarity of multiagent learning. We empirically demonstrate the effectiveness of our approaches in two domains with extremely sparse feedback: (1) a variety of Multiagent Trash Collection tasks, and (2) a challenging online mobile game, i.e., Fever Basketball Defense.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to learn effectively in cooperative multi - agent tasks with sparse and delayed rewards in multi - agent reinforcement learning (MARL). Specifically, the paper focuses on the following challenges: 1. **Non - stationary environment**: In a multi - agent system, as the learning strategies of each agent are constantly updated, the environment is non - stationary for each agent, which makes the learning process more difficult. 2. **Exponential growth of the policy space**: As the number of agents increases, the policy space grows exponentially, and traditional reinforcement learning methods are difficult to be directly applied. 3. **Sparse and delayed rewards**: In many real - world applications, the reward signals are sparse and delayed, which further increases the learning difficulty, especially for tasks that require long - term planning. To address these challenges, the paper introduces the concept of temporal abstraction and proposes a hierarchical deep multi - agent reinforcement learning (Hierarchical Deep MARL) method. Through temporal abstraction, the paper decomposes the problem into a hierarchical structure at different time scales, thereby reducing the learning difficulty. Specific contributions include: - **Proposing three hierarchical deep MARL architectures**: including hierarchical independent learner (h - IL), hierarchical communication network (h - Comm), and hierarchical Qmix network (h - Qmix), which are respectively applicable to different MARL paradigms. - **Introducing a new experience replay mechanism**: namely augmented concurrent experience replay (ACER), which alleviates the problems of sparse transitions and non - stationarity by enhancing high - level experience replay and concurrent sampling. - **Experimental verification**: Experimental verification was carried out in two environments with extremely sparse feedback, namely multi - agent trash collection tasks and an online mobile game "Fever Basketball Defense". Through these methods, the paper shows that hierarchical deep multi - agent reinforcement learning can learn cooperative strategies more effectively in sparse and delayed - reward environments.

Hierarchical Deep Multiagent Reinforcement Learning with Temporal Abstraction

Learning to Cooperate: Application of Deep Reinforcement Learning for Online AGV Path Finding.

Hierarchical Coordination Multi-Agent Reinforcement Learning with Spatio-Temporal Abstraction

Multiagent Reinforcement Learning for Strictly Constrained Tasks Based on Reward Recorder

Off-Beat Multi-Agent Reinforcement Learning

Hierarchical Multi-Agent Reinforcement Learning for Cooperative Tasks with Sparse Rewards in Continuous Domain

HELSA: Hierarchical Reinforcement Learning with Spatiotemporal Abstraction for Large-Scale Multi-Agent Path Finding

Decentralized Multi-agent Reinforcement Learning with Multi-time Scale of Decision Epochs

Hierarchical State Extraction Method for Multi-Agent Environment

Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill Discovery

Guiding Multi-agent Multi-task Reinforcement Learning by a Hierarchical Framework with Logical Reward Shaping

Self-Clustering Hierarchical Multi-Agent Reinforcement Learning with Extensible Cooperation Graph

Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes

Multiexperience-Assisted Efficient Multiagent Reinforcement Learning

Multi-agent reinforcement learning with synchronized and decomposed reward automaton synthesized from reactive temporal logic

Multi-Agent Reinforcement Learning in Time-varying Networked Systems

Hierarchical relationship modeling in multi-agent reinforcement learning for mixed cooperative–competitive environments

Agent-Time Attention for Sparse Rewards Multi-Agent Reinforcement Learning

Hierarchical Task Network Planning for Facilitating Cooperative Multi-Agent Reinforcement Learning

Weighted Mean-Field Multi-Agent Reinforcement Learning via Reward Attribution Decomposition

Multi-Agent Reinforcement Learning is a Sequence Modeling Problem