Efficient Reinforcement Learning With the Novel N-Step Method and V-Network
Miaomiao Zhang,Shuo Zhang,Xinying Wu,Zhiyi Shi,Xiangyang Deng,Edmond Q Wu,Xin Xu
DOI: https://doi.org/10.1109/TCYB.2024.3401014
Abstract:The application of reinforcement learning (RL) in artificial intelligence has become increasingly widespread. However, its drawbacks are also apparent, as it requires a large number of samples for support, making the enhancement of sample efficiency a research focus. To address this issue, we propose a novel N-step method. This method extends the horizon of the agent, enabling it to acquire more long-term effective information, thus resolving the issue of data inefficiency in RL. Additionally, this N-step method can reduce the estimation variance of Q-function, which is one of the factors contributing to estimation errors in Q-function estimation. Apart from high variance, estimation bias in Q-function estimation is another factor leading to estimation errors. To mitigate the estimation bias of Q-function, we design a regularization method based on the V-function, which has been underexplored. The combination of these two methods perfectly addresses the problems of low sample efficiency and inaccurate Q-function estimation in RL. Finally, extensive experiments conducted in discrete and continuous action spaces demonstrate that the proposed novel N-step method, when combined with classical deep Q-network, deep deterministic policy gradient, and TD3 algorithms, is effective, consistently outperforming the classical algorithms.