Abstract:In this paper, an online selective kernel-based temporal difference (OSKTD) learning algorithm is proposed to deal with large scale and/or continuous reinforcement learning problems. OSKTD includes two online procedures: online sparsification and parameter updating for the selective kernel-based value function. A new sparsification method (i.e., a kernel distance-based online sparsification method) is proposed based on selective ensemble learning, which is computationally less complex compared with other sparsification methods. With the proposed sparsification method, the sparsified dictionary of samples is constructed online by checking if a sample needs to be added to the sparsified dictionary. In addition, based on local validity, a selective kernel-based value function is proposed to select the best samples from the sample dictionary for the selective kernel-based value function approximator. The parameters of the selective kernel-based value function are iteratively updated by using the temporal difference (TD) learning algorithm combined with the gradient descent technique. The complexity of the online sparsification procedure in the OSKTD algorithm is O(n). In addition, two typical experiments (Maze and Mountain Car) are used to compare with both traditional and up-to-date O(n) algorithms (GTD, GTD2, and TDC using the kernel-based value function), and the results demonstrate the effectiveness of our proposed algorithm. In the Maze problem, OSKTD converges to an optimal policy and converges faster than both traditional and up-to-date algorithms. In the Mountain Car problem, OSKTD converges, requires less computation time compared with other sparsification methods, gets a better local optima than the traditional algorithms, and converges much faster than the up-to-date algorithms. In addition, OSKTD can reach a competitive ultimate optima compared with the up-to-date algorithms.

Online Sparse Temporal Difference Learning Based on Nested Optimization and Regularized Dual Averaging.

Online attentive kernel-based temporal difference learning

Statistical Inference for Temporal Difference Learning with Linear Function Approximation

Differentially Private Temporal Difference Learning with Stochastic Nonconvex-Strongly-Concave Optimization

Online Selective Kernel-Based Temporal Difference Learning

Orthogonal Matching Pursuit for Least Squares Temporal Difference with Gradient Correction

Almost Sure Convergence of Average Reward Temporal Difference Learning

Target-Based Temporal Difference Learning

Investigating practical linear temporal difference learning

Federated Temporal Difference Learning with Linear Function Approximation under Environmental Heterogeneity

Soft Policy Optimization Using Dual-Track Advantage Estimator.

Off-Policy Temporal Difference Learning with Bellman Residuals

Is Temporal Difference Learning Optimal? an Instance-Dependent Analysis

Reanalysis of Variance Reduced Temporal Difference Learning

META-Learning Eligibility Traces for More Sample Efficient Temporal Difference Learning

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning

Decentralized Adaptive TD $(\lambda)$ Learning with Linear Function Approximation: Nonasymptotic Analysis

Kernel Recursive Least-Squares Temporal Difference Algorithms With Sparsification And Regularization

A Variance Minimization Approach to Temporal-Difference Learning

Data Efficient Deep Reinforcement Learning with Action-Ranked Temporal Difference Learning

A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning