Distributed policy evaluation via inexact ADMM in multi-agent reinforcement learning

Xiaoxiao Zhao,Peng Yi,Li Li
DOI: https://doi.org/10.1007/s11768-020-00007-x
2020-01-01
Control Theory and Technology
Abstract:This paper studies a distributed policy evaluation in multi-agent reinforcement learning. Under cooperative settings, each agent only obtains a local reward, while all agents share a common environmental state. To optimize the global return as the sum of local return, the agents exchange information with their neighbors through a communication network. The mean squared projected Bellman error minimization problem is reformulated as a constrained convex optimization problem with a consensus constraint; then, a distributed alternating directions method of multipliers (ADMM) algorithm is proposed to solve it. Furthermore, an inexact step for ADMM is used to achieve efficient computation at each iteration. The convergence of the proposed algorithm is established.
What problem does this paper attempt to address?