Abstract:Recently, multiagent reinforcement learning (MARL) has shown great potential for learning cooperative policies in multiagent systems (MASs). However, a noticeable drawback of current MARL is the low sample efficiency, which causes a huge amount of interactions with environment. Such amount of interactions greatly hinders the real-world application of MARL. Fortunately, effectively incorporating experience knowledge can assist MARL to quickly find effective solutions, which can significantly alleviate the drawback. In this article, a novel multiexperience-assisted reinforcement learning (MEARL) method is proposed to improve the learning efficiency of MASs. Specifically, monotonicity-constrained reward shaping is innovatively designed using expert experience to provide additional individual rewards to guide multiagent learning efficiently, with the invariance guarantee of the team optimization objective. Furthermore, a reward distribution estimator is specially developed to model an implicated reward distribution of environment by using transition experience from environment, containing collected samples (state-action pair, reward, and next state). This estimator can predict the expectation reward of each agent for the taken action to accurately estimate the state value function and accelerate its convergence. Besides, the performance of MEARL is evaluated on two multiagent environment platforms: our designed unmanned aerial vehicle combat (UAV-C) and StarCraft II Micromanagement (SCII-M). Simulation results demonstrate that the proposed MEARL can greatly improve the learning efficiency and performance of MASs and is superior to the state-of-the-art methods in multiagent tasks.

State-based episodic memory for multi-agent reinforcement learning

Efficient Episodic Memory Utilization of Cooperative Multi-Agent Reinforcement Learning

S2RL: Do We Really Need to Perceive All States in Deep Multi-Agent Reinforcement Learning?

S2rl

S2RL: DoWe Really Need to Perceive All States in Deep Multi-Agent Reinforcement Learning?

Episodic Reinforcement Learning with Associative Memory.

Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration

Multiexperience-Assisted Efficient Multiagent Reinforcement Learning

Deep Reinforcement Learning with Parametric Episodic Memory

Dual Memory Model for Experience-Once Task-Incremental Lifelong Learning.

Sequential memory improves sample and memory efficiency in Episodic Control

SC-MAIRL: Semi-Centralized Multi-Agent Imitation Reinforcement Learning

Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?

Episodic Memory Deep Q-Networks

Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

Sample-efficient multi-agent reinforcement learning with masked reconstruction

Multi-agent Exploration with Sub-state Entropy Estimation

Multiple Unmanned Aerial Vehicle (multi-UAV) Reconnaissance and Search with Limited Communication Range Using Semantic Episodic Memory in Reinforcement Learning

Attentive Relational State Representation in Decentralized Multiagent Reinforcement Learning.

Two-Memory Reinforcement Learning

Multiagent Reinforcement Learning for Strictly Constrained Tasks Based on Reward Recorder