Abstract:We study provable multi-agent reinforcement learning (RL) in the general framework of partially observable stochastic games (POSGs). To circumvent the known hardness results and the use of computationally intractable oracles, we advocate leveraging the potential \emph{information-sharing} among agents, a common practice in empirical multi-agent RL, and a standard model for multi-agent control systems with communications. We first establish several computational complexity results to justify the necessity of information-sharing, as well as the observability assumption that has enabled quasi-efficient single-agent RL with partial observations, for efficiently solving POSGs. {Inspired by the inefficiency of planning in the ground-truth model,} we then propose to further \emph{approximate} the shared common information to construct an {approximate model} of the POSG, in which planning an approximate \emph{equilibrium} (in terms of solving the original POSG) can be quasi-efficient, i.e., of quasi-polynomial-time, under the aforementioned assumptions. Furthermore, we develop a partially observable multi-agent RL algorithm that is \emph{both} statistically and computationally quasi-efficient. {Finally, beyond equilibrium learning, we extend our algorithmic framework to finding the \emph{team-optimal solution} in cooperative POSGs, i.e., decentralized partially observable Markov decision processes, a much more challenging goal. We establish concrete computational and sample complexities under several common structural assumptions of the model.} We hope our study could open up the possibilities of leveraging and even designing different \emph{information structures}, a well-studied notion in control theory, for developing both sample- and computation-efficient partially observable multi-agent RL.

Multi-agent Off-policy Actor-Critic Reinforcement Learning for Partially Observable Environments

Observer-Based Multiagent Deep Reinforcement Learning: A Fully Distributed Training Scheme

Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems

Leveraging Fully Observable Policies for Learning under Partial Observability

Less Is More: Robust Robot Learning via Partially Observable Multi-Agent Reinforcement Learning

Optimal Decision-Making in Mixed-Agent Partially Observable Stochastic Environments via Reinforcement Learning

A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Partially Observable Multi-Agent Reinforcement Learning with Information Sharing

Offline-Online Actor-Critic for Partially Observable Markov Decision Process

Multi-agent Reinforcement Learning by the Actor-Critic Model with an Attention Interface

Emergent Social Learning via Multi-agent Reinforcement Learning

Heterogeneous Multi-Agent Reinforcement Learning for Unknown Environment Mapping

Cooperative Multi-Agent Reinforcement Learning with Partial Observations

Decentralized Multi-Agent Reinforcement Learning: An Off-Policy Method

Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General Utilities

Learning Interpretable Policies in Hindsight-Observable POMDPs through Partially Supervised Reinforcement Learning

Joint Recurrent Actor-Critic Model for Partially Observable Control.

Off-Policy Neural Fitted Actor-Critic

R-MADDPG for Partially Observable Environments and Limited Communication

Unbiased Asymmetric Reinforcement Learning under Partial Observability