Abstract:Large-scale multi-agent systems are often deployed across wide geographic areas, where agents interact with heterogeneous environments. There is an emerging interest in understanding the role of heterogeneity in the performance of the federated versions of classic reinforcement learning algorithms. In this paper, we study synchronous federated Q-learning, which aims to learn an optimal Q-function by having $K$ agents average their local Q-estimates per $E$ iterations. We observe an interesting phenomenon on the convergence speeds in terms of $K$ and $E$. Similar to the homogeneous environment settings, there is a linear speed-up concerning $K$ in reducing the errors that arise from sampling randomness. Yet, in sharp contrast to the homogeneous settings, $E>1$ leads to significant performance degradation. Specifically, we provide a fine-grained characterization of the error evolution in the presence of environmental heterogeneity, which decay to zero as the number of iterations $T$ increases. The slow convergence of having $E>1$ turns out to be fundamental rather than an artifact of our analysis. We prove that, for a wide range of stepsizes, the $\ell_{\infty}$ norm of the error cannot decay faster than $\Theta (E/T)$. In addition, our experiments demonstrate that the convergence exhibits an interesting two-phase phenomenon. For any given stepsize, there is a sharp phase-transition of the convergence: the error decays rapidly in the beginning yet later bounces up and stabilizes. Provided that the phase-transition time can be estimated, choosing different stepsizes for the two phases leads to faster overall convergence.

Asymptotic Convergence and Performance of Multi-Agent Q-Learning Dynamics

Beyond Strict Competition: Approximate Convergence of Multi Agent Q-Learning Dynamics

Stability of Multi-Agent Learning: Convergence in Network Games with Many Players

Stability of Multi-Agent Learning in Competitive Networks: Delaying the Onset of Chaos

Convergence of Heterogeneous Learning Dynamics in Zero-sum Stochastic Games

Efficient off‐policy Q‐learning for multi‐agent systems by solving dual games

Independent and Decentralized Learning in Markov Potential Games

Multi-Agent Alternate Q-Learning.

Reinforcement Learning for Non-stationary Discrete-Time Linear–Quadratic Mean-Field Games in Multiple Populations

A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

On the Convergence Rates of Federated Q-Learning across Heterogeneous Environments

Multiagent Soft Q-Learning

A Risk-Averse Equilibrium for Multi-Agent Systems

Dynamics of Boltzmann Q-Learning in Two-Player Two-Action Games

Mutation-Bias Learning in Games

Convergence of Decentralized Actor-Critic Algorithm in General-sum Markov Games

Independent Learning in Stochastic Games

Meta-game equilibrium for multi-agent reinforcement learning

Generalized Individual Q-learning for Polymatrix Games with Partial Observations

Convergence Analysis of Graphical Game-Based Nash Q−Learning Using the Interaction Detection Signal of N−Step Return

Convergence of Multi-Scale Reinforcement Q-Learning Algorithms for Mean Field Game and Control Problems