PRAG: Periodic Regularized Action Gradient for Efficient Continuous Control

Xihui Li,Zhongjian Qiao,Aicheng Gong,Jiafei Lyu,Chenghui Yu,Jiangpeng Yan,Xiu Li
DOI: https://doi.org/10.1007/978-3-031-20868-3_8
2022-01-01
Abstract:For actor-critic methods in reinforcement learning, it is vital to learn a useful critic such that the actor can be guided efficiently and properly. Previous methods mainly seek to estimate more accurate Q-values. However, in continuous control scenario where the actor is updated via deterministic policy gradient, only the action gradient (AG) is useful for updating the actor. It is thus a promising way to achieve higher sample efficiency by leveraging the action gradient of Q functions for policy guidance. Nevertheless, we empirically find that directly incorporating action gradient into the critics downgrades the performance of the agent, as it can be easily trapped in the localmaxima. To fully utilize the benefits of action gradient and escape from the local optima, we propose Periodic Regularized Action Gradient (PRAG), which periodically involves action gradient for critic learning and additionally maximizes the target value. On a set of MuJoCo continuous control tasks, we show that PRAG can achieve higher sample efficiency and better final performance without much extra training cost, comparing to common model-free baselines. Our code is available at: https://github.com/Galaxy-Li/PRAG.
What problem does this paper attempt to address?