Linfa-Q: Accurate Q-Learning with Linear Function Approximation
Zhechao Wang,Qiming Fu,Jianping Chen,Quan Liu,You Lu,Hongjie Wu,Fuyuan Hu
DOI: https://doi.org/10.1016/j.neucom.2024.128654
IF: 6
2025-01-01
Neurocomputing
Abstract:Although Q-learning has achieved remarkable success in some practical cases, it often suffers from the overestimation problem in stochastic environments, which is commonly viewed as a shortcoming of Q-learning. Overestimated values are introduced by estimations on the next state Q value, which is well-known as the maximization bias. In this paper, we propose a more accurate method for estimating the Q value by the Q value decomposition and re-evaluation with similar samples based on linear function approximation. Specifically, we reform the parameterized incremental update formula of Q-learning and also demonstrate that the new formula is equivalent to the original one. Moreover, we propose a new parameterized incremental update formula of Q-learning to address the overestimation problem and present the more accurate computing method, which can be used in problems with continuous state spaces and stochastic environments. Experimentally, when compared with Doubly Bounded Q-learning and other Q-learning based methods, the new algorithm has more than 31% improvement of performance in Mountain Car and Cart Pole. Furthermore, the algorithm is robust to the learning rate and its memory capacity. Finally, the practical applicability of our algorithm is discussed through an analysis of time consumption.