Abstract:In multi-agent reinforcement learning, optimal control with robustness guarantees are critical for its deployment in real world. However, existing methods face challenges related to sample complexity, training instability, potential suboptimal Nash Equilibrium convergence and non-robustness to multiple perturbations. In this paper, we propose a unified framework for learning \emph{stochastic} policies to resolve these issues. We embed cooperative MARL problems into probabilistic graphical models, from which we derive the maximum entropy (MaxEnt) objective optimal for MARL. Based on the MaxEnt framework, we propose \emph{Heterogeneous-Agent Soft Actor-Critic} (HASAC) algorithm. Theoretically, we prove the monotonic improvement and convergence to \emph{quantal response equilibrium} (QRE) properties of HASAC. Furthermore, HASAC is provably robust against a wide range of real-world uncertainties, including perturbations in rewards, environment dynamics, states, and actions. Finally, we generalize a unified template for MaxEnt algorithmic design named \emph{Maximum Entropy Heterogeneous-Agent Mirror Learning} (MEHAML), which provides any induced method with the same guarantees as HASAC. We evaluate HASAC on seven benchmarks: Bi-DexHands, Multi-Agent MuJoCo, Pursuit-Evade, StarCraft Multi-Agent Challenge, Google Research Football, Multi-Agent Particle Environment, Light Aircraft Game. Results show that HASAC consistently outperforms strong baselines in 34 out of 38 tasks, exhibiting improved training stability, better sample efficiency and sufficient exploration. The robustness of HASAC was further validated when encountering uncertainties in rewards, dynamics, states, and actions of 14 magnitudes, and real-world deployment in a multi-robot arena against these four types of uncertainties. See our page at \url{<a class="link-external link-https" href="https://sites.google.com/view/meharl" rel="external noopener nofollow">this https URL</a>}.

ERMAS: Becoming Robust to Reward Function Sim-to-Real Gaps in Multi-Agent Simulations

Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms

Robust Multi-Agent Control via Maximum Entropy Heterogeneous-Agent Reinforcement Learning

Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty

Learning to Play General-Sum Games against Multiple Boundedly Rational Agents

Learning and Calibrating Heterogeneous Bounded Rational Market Behaviour with Multi-Agent Reinforcement Learning

A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems

Robust Cooperative Multi-Agent Reinforcement Learning:A Mean-Field Type Game Perspective

Simulating the Economic Impact of Rationality through Reinforcement Learning and Agent-Based Modelling

Robust Multiobjective Reinforcement Learning Considering Environmental Uncertainties

Formal Contracts Mitigate Social Dilemmas in Multi-Agent RL

Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation

Robust Multi-Agent Reinforcement Learning with State Uncertainty

A Risk-Averse Equilibrium for Multi-Agent Systems

Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization

MIR2: Towards Provably Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization

Policy Evaluation and Seeking for Multi-Agent Reinforcement Learning Via Best Response

EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL

Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning

A Multi-agent Cooperative Learning System with Evolution of Social Roles

Formal contracts mitigate social dilemmas in multi-agent reinforcement learning