Abstract:Stochastic games are a popular framework for studying multi-agent reinforcement learning (MARL). Recent advances in MARL have focused primarily on games with finitely many states. In this work, we study multi-agent learning in stochastic games with general state spaces and an information structure in which agents do not observe each other's actions. In this context, we propose a decentralized MARL algorithm and we prove the near-optimality of its policy updates. Furthermore, we study the global policy-updating dynamics for a general class of best-reply based algorithms and derive a closed-form characterization of convergence probabilities over the joint policy space.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to achieve decentralized multi - agent reinforcement learning (MARL) in continuous - space stochastic games with standard Borel state spaces. Specifically, the paper focuses on designing a decentralized MARL algorithm under an information structure where agents cannot observe the actions of other agents, and proving the approximate optimality of its policy updates. In addition, the paper also studies the global policy - update dynamics of a class of best - reply - based algorithms and derives a closed - form expression for the convergence probability on the joint - policy space. ### Main problem decomposition: 1. **Policy update in a decentralized learning environment**: - In a decentralized environment, agents cannot directly observe the actions of other agents. Therefore, a mechanism needs to be designed so that each agent can independently update its policy based on its own observations. - The paper proposes a method of quantizing the state space, discretizing the continuous state space into a finite set of states, so that the Q - learning algorithm can be applied. 2. **Proof of approximate optimality**: - The paper proves that, under the assumptions of weakly continuous transition kernels and continuous bounded cost functions, the policy update of the proposed decentralized MARL algorithm is asymptotically optimal. - By introducing the exploration - phase technique, agents only update their policies at predetermined time points and keep their policies unchanged between these time points. 3. **Analysis of global policy - update dynamics**: - The paper analyzes the global policy - update dynamics of a class of best - reply - based algorithms, modeling it as a Markov chain. - It derives a closed - form expression for the probability of converging to an equilibrium point under a weakly cyclic structure. ### Formula representation: - Quantization mapping of the state space: \[ q: X \to Y \] - where \( X \) is the original state space and \( Y=\{y_1, y_2,\dots,y_M\} \) is the quantized finite state space. - Quantized Q - learning update rule: \[ Q_t^{(i)}(q(x), u)=(1 - \alpha_t(q(x), u))Q_{t - 1}^{(i)}(q(x), u)+\alpha_t(q(x), u)\left(c(x, u)+\beta\min_{v\in U}Q_{t - 1}^{(i)}(q(X_{t + 1}), v)\right) \] - Best - reply set: \[ \text{BR}_i^\delta(Q_t)=\left\{\hat{\gamma}\in\hat{\Pi}_i^q:Q_t(y,\hat{\gamma}(y))\leq\min_{u\in U_i}Q_t(y, u)+\delta_i,\forall y\in Y_i^q\right\} \] ### Summary: The main contribution of the paper is to propose a decentralized multi - agent reinforcement learning algorithm suitable for continuous - state - space stochastic games and prove the approximate optimality of its policy updates. In addition, the paper analyzes the global policy - update dynamics of best - reply - based algorithms and gives a closed - form expression for the probability of converging to an equilibrium point. These results provide a theoretical basis for understanding and optimizing decentralized multi - agent systems.

Decentralized Multi-Agent Reinforcement Learning for Continuous-Space Stochastic Games

S2rl

Multiagent Reinforcement Learning for Strictly Constrained Tasks Based on Reward Recorder

S2RL: Do We Really Need to Perceive All States in Deep Multi-Agent Reinforcement Learning?

Decentralized Multi-Agent Reinforcement Learning: An Off-Policy Method

S2RL: DoWe Really Need to Perceive All States in Deep Multi-Agent Reinforcement Learning?

Multi-Agent Reinforcement Learning in Stochastic Networked Systems.

Multi-Agent Reinforcement Learning in Stochastic Networked Systems

Decentralized multi-agent reinforcement learning based on best-response policies

Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?

Multi-Agent Reinforcement Learning With Decentralized Distribution Correction

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Capacity-Limited Decentralized Actor-Critic for Multi-Agent Games

Decentralized learning in Markov games

Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning

An off-policy multi-agent stochastic policy gradient algorithm for cooperative continuous control

Decentralized Multi-agent Reinforcement Learning with Multi-time Scale of Decision Epochs

Mean-Field Multi-Agent Reinforcement Learning: A Decentralized Network Approach

Decentralized Multi-Agent Reinforcement Learning with Networked Agents: Recent Advances

Multi-Agent Reinforcement Learning in Time-varying Networked Systems

Learning in Nonzero-Sum Stochastic Games with Potentials