Online attentive kernel-based temporal difference learning

Xingguo Chen,Guang Yang,Shangdong Yang,Huihui Wang,Shaokang Dong,Yang Gao
DOI: https://doi.org/10.1016/j.knosys.2023.110902
IF: 8.139
2023-10-01
Knowledge-Based Systems
Abstract:Kernel-based reinforcement learning has received increasing attention because it requires less prior knowledge linear approximation and neural networks. Online kernel-based updating, however, is hindered by the challenge of catastrophic forgetting or interference. Sparse representation is a key method to address this issue, but existing methods fail to satisfy four criteria: learnability, nonprior, nontruncation, and explicitness. In this paper, we present an attentive kernel-based value function approximation as a learnable, nonprior, nontruncated, and explicit sparse representation. We propose the online attentive kernel-based temporal difference (OAKTD) algorithm, which employs two-timescale optimization, and provide a convergence analysis for our proposed algorithm. Experimental results show that OAKTD outperforms online kernel-based TD learning algorithms, and the TD learning algorithm with Tile Coding on classical tasks, i.e., Mountain Car, Acrobot, CartPole and Puddle World.
computer science, artificial intelligence
What problem does this paper attempt to address?