Abstract:In this paper, an online selective kernel-based temporal difference (OSKTD) learning algorithm is proposed to deal with large scale and/or continuous reinforcement learning problems. OSKTD includes two online procedures: online sparsification and parameter updating for the selective kernel-based value function. A new sparsification method (i.e., a kernel distance-based online sparsification method) is proposed based on selective ensemble learning, which is computationally less complex compared with other sparsification methods. With the proposed sparsification method, the sparsified dictionary of samples is constructed online by checking if a sample needs to be added to the sparsified dictionary. In addition, based on local validity, a selective kernel-based value function is proposed to select the best samples from the sample dictionary for the selective kernel-based value function approximator. The parameters of the selective kernel-based value function are iteratively updated by using the temporal difference (TD) learning algorithm combined with the gradient descent technique. The complexity of the online sparsification procedure in the OSKTD algorithm is O(n). In addition, two typical experiments (Maze and Mountain Car) are used to compare with both traditional and up-to-date O(n) algorithms (GTD, GTD2, and TDC using the kernel-based value function), and the results demonstrate the effectiveness of our proposed algorithm. In the Maze problem, OSKTD converges to an optimal policy and converges faster than both traditional and up-to-date algorithms. In the Mountain Car problem, OSKTD converges, requires less computation time compared with other sparsification methods, gets a better local optima than the traditional algorithms, and converges much faster than the up-to-date algorithms. In addition, OSKTD can reach a competitive ultimate optima compared with the up-to-date algorithms.

Online attentive kernel-based temporal difference learning

Online Selective Kernel-Based Temporal Difference Learning

Kernel Least-Squares Temporal Difference Learning

True Online Temporal-Difference Learning

An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks

Online Kernel Learning with a Near Optimal Sparsity Bound

Investigating practical linear temporal difference learning

Finite-Sample Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation

Statistical Inference for Temporal Difference Learning with Linear Function Approximation

Reanalysis of Variance Reduced Temporal Difference Learning

Modified Retrace for Off-Policy Temporal Difference Learning.

Off-Policy Temporal Difference Learning with Bellman Residuals

Provable distributed adaptive temporal-difference learning over time-varying networks

Almost Sure Convergence of Average Reward Temporal Difference Learning

Gradient compensation traces based temporal difference learning

Gradient Descent Temporal Difference-Difference Learning

Context-aware Active Multi-Step Reinforcement Learning

QC-ODKLA: Quantized and Communication-Censored Online Decentralized Kernel Learning via Linearized ADMM

Temporal Difference Learning with Compressed Updates: Error-Feedback meets Reinforcement Learning

Effective Multi-step Temporal-Difference Learning for Non-Linear Function Approximation

Robust kernel adaptive filtering for nonlinear time series prediction