Online attentive kernel-based temporal difference learning
Xingguo Chen,Guang Yang,Shangdong Yang,Huihui Wang,Shaokang Dong,Yang Gao
DOI: https://doi.org/10.1016/j.knosys.2023.110902
IF: 8.139
2023-10-01
Knowledge-Based Systems
Abstract:Kernel-based reinforcement learning has received increasing attention because it requires less prior knowledge linear approximation and neural networks. Online kernel-based updating, however, is hindered by the challenge of catastrophic forgetting or interference. Sparse representation is a key method to address this issue, but existing methods fail to satisfy four criteria: learnability, nonprior, nontruncation, and explicitness. In this paper, we present an attentive kernel-based value function approximation as a learnable, nonprior, nontruncated, and explicit sparse representation. We propose the online attentive kernel-based temporal difference (OAKTD) algorithm, which employs two-timescale optimization, and provide a convergence analysis for our proposed algorithm. Experimental results show that OAKTD outperforms online kernel-based TD learning algorithms, and the TD learning algorithm with Tile Coding on classical tasks, i.e., Mountain Car, Acrobot, CartPole and Puddle World.
computer science, artificial intelligence