Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning

Jinchi Chen,Jie Feng,Weiguo Gao,Ke Wei
DOI: https://doi.org/10.48550/arxiv.2209.02179
2024-01-01
Journal of Machine Learning Research
Abstract:This paper studies a policy optimization problem arising from collaborative multi-agent reinforcement learning in a decentralized setting where agents communicate with their neighbors over an undirected graph to maximize the sum of their cumulative rewards. A novel decentralized natural policy gradient method, dubbed Momentum-based Decentralized Natural Policy Gradient (MDNPG), is proposed, which incorporates natural gradient, momentum-based variance reduction, and gradient tracking into the decentralized stochastic gradient ascent framework. The O( n - 1 f - 3 ) sample complexity for MDNPG to converge to an epsilon-stationary point has been established under standard assumptions, where n is the number of agents. It indicates that MDNPG can achieve the optimal convergence rate for decentralized policy gradient methods and possesses a linear speedup in contrast to centralized optimization methods. Moreover, superior empirical performance of MDNPG over other state -of -the -art algorithms has been demonstrated by extensive numerical experiments.
What problem does this paper attempt to address?