Abstract:In multi-agent reinforcement learning, optimal control with robustness guarantees are critical for its deployment in real world. However, existing methods face challenges related to sample complexity, training instability, potential suboptimal Nash Equilibrium convergence and non-robustness to multiple perturbations. In this paper, we propose a unified framework for learning \emph{stochastic} policies to resolve these issues. We embed cooperative MARL problems into probabilistic graphical models, from which we derive the maximum entropy (MaxEnt) objective optimal for MARL. Based on the MaxEnt framework, we propose \emph{Heterogeneous-Agent Soft Actor-Critic} (HASAC) algorithm. Theoretically, we prove the monotonic improvement and convergence to \emph{quantal response equilibrium} (QRE) properties of HASAC. Furthermore, HASAC is provably robust against a wide range of real-world uncertainties, including perturbations in rewards, environment dynamics, states, and actions. Finally, we generalize a unified template for MaxEnt algorithmic design named \emph{Maximum Entropy Heterogeneous-Agent Mirror Learning} (MEHAML), which provides any induced method with the same guarantees as HASAC. We evaluate HASAC on seven benchmarks: Bi-DexHands, Multi-Agent MuJoCo, Pursuit-Evade, StarCraft Multi-Agent Challenge, Google Research Football, Multi-Agent Particle Environment, Light Aircraft Game. Results show that HASAC consistently outperforms strong baselines in 34 out of 38 tasks, exhibiting improved training stability, better sample efficiency and sufficient exploration. The robustness of HASAC was further validated when encountering uncertainties in rewards, dynamics, states, and actions of 14 magnitudes, and real-world deployment in a multi-robot arena against these four types of uncertainties. See our page at \url{<a class="link-external link-https" href="https://sites.google.com/view/meharl" rel="external noopener nofollow">this https URL</a>}.

Multi-agent Exploration with Sub-state Entropy Estimation

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

MESA: Cooperative Meta-Exploration in Multi-Agent Learning through Exploiting State-Action Space Structure

S2rl

Self-Motivated Multi-Agent Exploration

Exploiting Semantic Epsilon Greedy Exploration Strategy in Multi-Agent Reinforcement Learning

S2RL: Do We Really Need to Perceive All States in Deep Multi-Agent Reinforcement Learning?

Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration

Subspace-Aware Exploration for Sparse-Reward Multi-Agent Tasks.

Multiexperience-Assisted Efficient Multiagent Reinforcement Learning

MAexp: A Generic Platform for RL-based Multi-Agent Exploration

Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration

Settling Decentralized Multi-Agent Coordinated Exploration by Novelty Sharing

MuDE: Multi-agent decomposed reward-based exploration

CMBE: Curiosity-driven Model-Based Exploration for Multi-Agent Reinforcement Learning in Sparse Reward Settings

Imagine, Initialize, and Explore: An Effective Exploration Method in Multi-Agent Reinforcement Learning

Efficient Multi-Agent Exploration with Mutual-Guided Actor-Critic

Robust Multi-Agent Control via Maximum Entropy Heterogeneous-Agent Reinforcement Learning

Boosting Value Decomposition Via Unit-Wise Attentive State Representation for Cooperative Multi-Agent Reinforcement Learning

Strangeness-driven exploration in multi-agent reinforcement learning

Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning