Regularization-Adapted Anderson Acceleration for Multi-Agent Reinforcement Learning

Siying Wang,Wenyu Chen,Liwei Huang,Fan Zhang,Zhitong Zhao,Hong Qu
DOI: https://doi.org/10.1016/j.knosys.2023.110709
IF: 8.139
2023-01-01
Knowledge-Based Systems
Abstract:Originating from model-free reinforcement learning (RL), many modern multi-agent reinforcement learning (MARL) algorithms are usually armed with the paradigm of Centralized Training with Decentralized Execution (CTDE) to mitigate the non-stationary problem and make the training process stable. However, these methods still suffer from sample inefficiency and slow training convergence as in the single-agent reinforcement learning setting. Many common methods aiming to tackle these problems utilize the experience buffer and parallel training mechanism to speed up the training process, which would cost more computing resources and may still underuse the sampled experiences. In this paper, we propose Regularization-Adapted Anderson Acceleration (RA3) for model-free, off-policy MARL algorithms. Under the CTDE paradigm, this specific RA3 approach treats the joint action-value function update as a fixed-point iteration task and speeds up the training process with the same amount of sampled experiences as in the baseline algorithms. Furthermore, our RA3 employs an adaptive regularization strategy related to Bellman residuals to stabilize the update process and enhance the training performance. Experimental results demonstrate that the improved learning speed and superior performance of our proposed method are significantly improved on the predator–prey game and the challenging StarCraft II micromanagement benchmark tasks.
What problem does this paper attempt to address?