Gradient Correction for Asynchronous Stochastic Gradient Descent in Reinforcement Learning

Jiaxin Gao,Yao Lyu,Wenxuan Wang,Yuming Yin,Fei Ma,Shengbo Eben Li
DOI: https://doi.org/10.1007/978-3-031-70392-8_127
2024-01-01
Abstract:AbstractDistributed stochastic gradient descent techniques have gained significant attention in recent years as a prevalent approach for reinforcement learning. Current distributed learning predominantly employs synchronous or asynchronous training strategies. While the asynchronous scheme avoids idle computing resources present in synchronous methods, it grapples with the stale gradient issue. This paper introduces a novel gradient correction algorithm aimed at alleviating the stale gradient problem. By leveraging second-order information within the worker node and incorporating current parameters from both the worker and server nodes, the gradient correction algorithm yields a refined gradient closer to the desired value. Initially, we outline the challenges associated with asynchronous update schemes and derive a gradient correction algorithm employing local second-order approximations. Subsequently, we propose an asynchronous training scheme incorporating gradient correction within the generalized policy iteration framework. Lastly, in the context of trajectory tracking tasks, we compare the impact of employing gradient correction versus its absence in an asynchronous update scheme. Simulation results underscore the superiority of our proposed training scheme, demonstrating notably faster convergence and higher policy performance compared to the existing asynchronous update methods.
What problem does this paper attempt to address?