Abstract:Sample usage efficiency is an important factor affecting the convergence speed of multi-agent deep reinforcement learning (MADRL) algorithms. Most existing experience replay (ER) methods manually select experience samples to update the agent's policy. It is difficult to give suitable and efficient experience samples for different stages of agent policy learning as well as to effectively mine the potential value of experience samples in the replay buffer. Inspired by the idea of recommendation systems, this paper proposes a MADRL framework based on reinforcement recommendation and group modification to improve sample use efficiency and the ability to find the optimal solution of the multi-agent system in different task scenario categories. First, we use the sampling probability of each experience sample output from the recommendation network to recommend sampling instead of manual sampling; simultaneously, we collect the performance of the multi-agent system after updating the policy with the experience sample of recommendation sampling and construct the reinforcement learning process of the recommendation network. Next, we modify the individual policy of the agent according to the group rewards to improve the agent's ability to learn the optimal solution. We then combine and embed the reinforcement recommendation and group modification modules into the MADRL algorithm MAAC. Finally, we experiment with task scenarios, including cooperative collection, command movement, and target navigation, and extend this framework to the MADDPG algorithm to verify its scalability. The experimental results show that the off-policy MADRL algorithms combined with the proposed framework outperform the baseline algorithm in terms of sample usage efficiency and have better universality for the number of agents and scene categories.

HiMacMic: Hierarchical Multi-Agent Deep Reinforcement Learning with Dynamic Asynchronous Macro Strategy

A Dynamically Adaptive Approach to Reducing Strategic Interference for Multi-agent Systems

Learning Macromanagement in Starcraft by Deep Reinforcement Learning

Macro-Action-Based Multi-Agent/Robot Deep Reinforcement Learning under Partial Observability

Macro-Action-Based Deep Multi-Agent Reinforcement Learning

Cooperative multi-agent game based on reinforcement learning

Dueling Network Architecture for Multi-Agent Deep Deterministic Policy Gradient

Sample-efficient multi-agent reinforcement learning with masked reconstruction

MARRGM: Learning Framework for Multi-Agent Reinforcement Learning via Reinforcement Recommendation and Group Modification

A further exploration of deep Multi-Agent Reinforcement Learning with Hybrid Action Space

Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes

Efficient Multi-Agent Exploration with Mutual-Guided Actor-Critic

Multi-agent Deep Reinforcement Learning Based on Maximum Entropy

Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

Hierarchical Reinforcement Learning for Multi-agent MOBA Game

Coordinating Multi-Agent Deep Reinforcement Learning in Wargame

Hierarchical Deep Multiagent Reinforcement Learning with Temporal Abstraction

Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces

Effective Multi-Agent Deep Reinforcement Learning Control with Relative Entropy Regularization

Hierarchical Macro Strategy Model for MOBA Game AI