Abstract:Many cooperative multi-agent problems require agents to learn individual tasks while contributing to the collective success of the group. This is a challenging task for current state-of-the-art multi-agent reinforcement algorithms that are designed to either maximize the global reward of the team or the individual local rewards. The problem is exacerbated when either of the rewards is sparse leading to unstable learning. To address this problem, we present Decomposed Multi-Agent Deep Deterministic Policy Gradient (DE-MADDPG): a novel cooperative multi-agent reinforcement learning framework that simultaneously learns to maximize the global and local rewards. We evaluate our solution on the challenging defensive escort team problem and show that our solution achieves a significantly better and more stable performance than the direct adaptation of the MADDPG algorithm.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how, in multi - agent reinforcement learning, agents can contribute to the overall success of the team while learning individual tasks. Currently, the state - of - the - art multi - agent reinforcement learning algorithm designs are either to maximize the overall reward of the team or to maximize the individual local rewards. When either of the reward signals is sparse, it will lead to unstable learning. To solve this problem, the authors propose Decomposed Multi - Agent Deep Deterministic Policy Gradient (DE - MADDPG): a novel cooperative multi - agent reinforcement learning framework that can simultaneously learn to maximize global and local rewards. Specifically, the paper points out that in cooperative multi - agent problems, each agent needs to strive to maximize its own gain (local reward) and the collective success of the team (global reward) simultaneously. For example, in a defense escort team, each agent must maintain a specific distance from the goods to avoid violating any social norms while not sacrificing the safety of the goods. Although multi - agent reinforcement learning (MARL) has been successful in multi - player games, learning multi - agent cooperation while simultaneously maximizing local rewards remains an open challenge. In such learning problems, agents explicitly receive two reward signals: the global reward of the team and the individual local reward of the agent. To address these challenges, the paper proposes a dual - critic framework (DE - MADDPG). By training two critics to evaluate the global reward and the local reward respectively, it avoids the need to create an entangled multi - objective reward function. This method not only improves the stability of learning but also allows the application of performance - enhancing techniques, such as Prioritized Experience Replay (PER) and Twin Delayed Deep Deterministic Policy Gradient (TD3), to solve the over - estimation bias problem in the Q - function. Experimental results show that DE - MADDPG significantly outperforms the performance of directly adapting the MADDPG algorithm on the defense escort team problem and is more stable.

Multi-Agent Reinforcement Learning for Problems with Combined Individual and Team Reward

A Dynamically Adaptive Approach to Reducing Strategic Interference for Multi-agent Systems

Dueling Network Architecture for Multi-Agent Deep Deterministic Policy Gradient

Multi-Agent Cooperation Decision-Making by Reinforcement Learning with Encirclement Rewards

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

Modelling the Dynamic Joint Policy of Teammates with Attention Multi-agent DDPG

Twin Delayed Multi-Agent Deep Deterministic Policy Gradient

Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

The Design and Realization of Multi-agent Obstacle Avoidance based on Reinforcement Learning

Multi-Agent Confrontation Game Based on Multi-Agent Reinforcement Learning

Modelling the Dynamic Joint Policy of Teammates with Attention Multi-agent DDPG

Hindsight-aware Deep Reinforcement Learning Algorithm for Multi-Agent Systems

A Deep Reinforcement Learning-Based Method Applied for Solving Multi-Agent Defense and Attack Problems.

Friend-or-Foe Deep Deterministic Policy Gradient

Multi-Agent Distributed Deep Deterministic Policy Gradient for Partially Observable Tracking

Multi-agent Reinforcement Learning Algorithm Based on Local Information

Reinforcement Learning with Task Decomposition for Cooperative Multiagent Systems.

Individual Reward Assisted Multi-Agent Reinforcement Learning.

Multi-agent Reinforcement Learning for a Special Formation Problem

Ε-Maximum Critic Deep Deterministic Policy Gradient for Multi-agent Reinforcement Learning

Inducing Cooperation via Team Regret Minimization based Multi-Agent Deep Reinforcement Learning