Abstract:Existing value-based algorithms for cooperative multi-agent reinforcement learning (MARL) commonly rely on random exploration, such as $\epsilon$-greedy, to explore the environment. However, such exploration is inefficient at finding effective joint actions in states that require cooperation of multiple agents. In this work, we propose ensemble value functions for multi-agent exploration (EMAX), a general framework to seamlessly extend value-based MARL algorithms with ensembles of value functions. EMAX leverages the ensemble of value functions to guide the exploration of agents, stabilises their optimisation, and makes their policies more robust to miscoordination. These benefits are achieved by using a combination of three techniques. (1) EMAX uses the uncertainty of value estimates across the ensemble in a UCB policy to guide the exploration. This exploration policy focuses on parts of the environment which require cooperation across agents and, thus, enables agents to more efficiently learn how to cooperate. (2) During the optimisation, EMAX computes target values as average value estimates across the ensemble. These targets exhibit lower variance compared to commonly applied target networks, leading to significant benefits in MARL which commonly suffers from high variance caused by the exploration and non-stationary policies of other agents. (3) During evaluation, EMAX selects actions following a majority vote across the ensemble, which reduces the likelihood of selecting sub-optimal actions. We instantiate three value-based MARL algorithms with EMAX, independent DQN, VDN and QMIX, and evaluate them in 21 tasks across four environments. Using ensembles of five value functions, EMAX improves sample efficiency and final evaluation returns of these algorithms by 60%, 47%, and 539%, respectively, averaged across 21 tasks.

Option-based Multi-agent Exploration

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Self-Motivated Multi-Agent Exploration

An Autonomous Non-monolithic Agent with Multi-mode Exploration based on Options Framework

MESA: Cooperative Meta-Exploration in Multi-Agent Learning through Exploiting State-Action Space Structure

Two Heads Are Better Than One: A Simple Exploration Framework for Efficient Multi-Agent Reinforcement Learning.

Multi-agent Exploration with Sub-state Entropy Estimation

Subspace-Aware Exploration for Sparse-Reward Multi-Agent Tasks.

Influence-Based Multi-Agent Exploration

MAexp: A Generic Platform for RL-based Multi-Agent Exploration

Learning to explore by reinforcement over high-level options

Imagine, Initialize, and Explore: An Effective Exploration Method in Multi-Agent Reinforcement Learning

Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration

Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration

Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain

Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration

Efficient Multi-Agent Exploration with Mutual-Guided Actor-Critic

Settling Decentralized Multi-Agent Coordinated Exploration by Novelty Sharing

Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning

Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

Teaching pharmacology--what should be taught, what results can be expected?