Abstract:In real-world multi-robot systems, performing high-quality, collaborative behaviors requires robots to asynchronously reason about high-level action selection at varying time durations. Macro-Action Decentralized Partially Observable Markov Decision Processes (MacDec-POMDPs) provide a general framework for asynchronous decision making under uncertainty in fully cooperative multi-agent tasks. However, multi-agent deep reinforcement learning methods have only been developed for (synchronous) primitive-action problems. This paper proposes two Deep Q-Network (DQN) based methods for learning decentralized and centralized macro-action-value functions with novel macro-action trajectory replay buffers introduced for each case. Evaluations on benchmark problems and a larger domain demonstrate the advantage of learning with macro-actions over primitive-actions and the scalability of our approaches.

What problem does this paper attempt to address?

This paper attempts to solve the problem of how to achieve high - quality collaborative behaviors through asynchronous selection and execution of high - level actions (macro - actions) in multi - robot systems. Specifically, the existing multi - agent deep reinforcement learning methods mainly focus on synchronous, primitive actions problems, while robots in practical applications usually need to independently select and complete high - level actions within different time intervals. Therefore, this paper proposes two methods based on Deep Q - Network (DQN) for learning decentralized and centralized macro - action value functions, and introduces new Macro - Action Concurrent Experience Replay Trajectories (Mac - CERTs) and Macro - Action Joint Experience Replay Trajectories (Mac - JERTs) to support these learning methods. ### Specific Problems and Solutions 1. **Asynchronous Decision - making and Execution**: - **Problem**: Existing methods assume that all agents' actions are synchronously executed basic operations, while in the real world, robots often select and complete actions at different times in cooperative tasks. - **Solution**: Introduce macro - actions, allowing agents to asynchronously select and terminate actions at different times, thus more naturally representing high - level control strategies (such as navigating to a target point or grasping an object). 2. **Multi - agent Reinforcement Learning under Partial Observability**: - **Problem**: How to effectively learn and make decisions in a partially observable multi - agent environment. - **Solution**: Extend MacDec - POMDPs (Macro - Action Decentralized Partially Observable Markov Decision Processes), and propose a DQN - based learning method to adapt to this complex environment. 3. **Decentralized and Centralized Learning**: - **Problem**: How to effectively learn in decentralized and centralized scenarios respectively. - **Solution**: - **Decentralized Learning**: Propose a DQN - based decentralized macro - action learning method, generate Macro - Action Concurrent Experience Replay Trajectories (Mac - CERTs), so that each agent can maintain its own macro - action trajectory. - **Centralized Learning**: Propose a DQN - based centralized macro - action learning method, generate Macro - Action Joint Experience Replay Trajectories (Mac - JERTs) to maintain the time information of macro - action trajectories, and introduce a conditional target prediction method to learn the centralized joint macro - action value function. ### Experimental Verification The paper is evaluated in multiple benchmark problems and larger - scale tasks, demonstrating the advantages of using macro - action learning and the scalability of the methods. The experimental results show that macro - action learning can achieve higher performance compared to primitive - action learning, and also shows good effects in larger environmental spaces. ### Summary This paper solves the learning problem of asynchronous collaborative behaviors in multi - robot systems by introducing macro - actions and corresponding learning mechanisms, especially in partially observable multi - agent environments, providing more efficient and flexible solutions.

Macro-Action-Based Deep Multi-Agent Reinforcement Learning

Macro-Action-Based Multi-Agent/Robot Deep Reinforcement Learning under Partial Observability

Multi-Agent/Robot Deep Reinforcement Learning with Macro-Actions (Student Abstract)

Observer-Based Multiagent Deep Reinforcement Learning: A Fully Distributed Training Scheme

Learning for Decentralized Control of Multiagent Systems in Large, Partially-Observable Stochastic Environments

Deep Reinforcement Learning With Macro-Actions

Decentralized control of multi-robot partially observable Markov decision processes using belief space macro-actions

Decentralized Control of Partially Observable Markov Decision Processes using Belief Space Macro-actions

Decentralized Multi-Agent Reinforcement Learning: An Off-Policy Method

Macro Action Reinforcement Learning with Sequence Disentanglement using Variational Autoencoder

Multi-agent Reinforcement Learning with Deep Networks for Diverse Q-Vectors

Effective Multi-Agent Deep Reinforcement Learning Control with Relative Entropy Regularization

On Improving Deep Reinforcement Learning for POMDPs

R-MADDPG for Partially Observable Environments and Limited Communication

Learning Multi-Agent Cooperation via Considering Actions of Teammates

UAV Cooperative Air Combat Maneuvering Confrontation Based on Multi-agent Reinforcement Learning

UAV Swarm Confrontation Using Hierarchical Multiagent Reinforcement Learning

A Compression-Inspired Framework for Macro Discovery

Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

Decentralized Multi-agent Reinforcement Learning with Multi-time Scale of Decision Epochs

Decentralized Multi-Agent Reinforcement Learning with Global State Prediction