Abstract:For online updates and data efficiency, forward-view algorithms are transformed into backward-views, such as temporal difference learning (TD) and its control versions, by eligibility traces. Existing researches on eligibility traces, such as TD(<math>λ</math>) and true-online TD(<math>λ</math>), mainly focus on the equivalence between forward-views and backward-views. However, the choice of <math>λ</math> refers to the time scope of the credit-assignment, and a small <math>λ</math> accelerates the decay of credit over the time. This paper takes a different implementation of the backward-view named gradient compensation traces (GCT). GCT compensates the difference between a bootstrapping estimated gradient and the true gradient online to remove the extra decay of the credit. Based on GCT, the corresponding temporal difference learning (gradient compensation TD, GCTD) is proved to converge conditionally. The sensitivity of GCTD's hyper-parameter is analyzed in the nonlinear long-corridor and linear random-walk task. The proposed algorithm is comparable with true-online TD(<math>λ</math>) in the basic Mountain Car task, and outperforms the baselines in the reward sparse setting.

Tracking-differentiator Based on Conjugate Gradient Algorithm

Equivalent linear analysis and optimization of tracking differentiator

Tracking Differentiator Via Time Criterion

A Compensatory Algorithm for High-Speed Visual Object Tracking Based on Markov Chain

Tracking differentiator based back-stepping control for valve-controlled hydraulic actuator system

Online Algorithm For Third-Order Minimum Time Discrete Tracking Differentiator

Design of Backstepping Control Based on a Softsign Linear–Nonlinear Tracking Differentiator for an Electro-Optical Tracking System

Gradient compensation traces based temporal difference learning

Discrete-Time Adaptive State Tracking Control Schemes Using Gradient Algorithms

Conditioners for Adaptive Regression Tracking.

Accelerated Gradient Temporal Difference Learning

A Control Method for Nonlinear Time-delay Systems Based on a Strong Tracking Predictor

On The Unified Design Of Accelerated Gradient Descent

Zhang-gradient Tracking Controllers of Z1G0 and Z1G1 Types for Time-Invariant Linear Systems

Modular Design of Adaptive Tracking for a Class of Stochastic Nonlinear Systems

Adaptive tracking control for constrained nonlinear nonstrict-feedback switched stochastic systems with unknown control directions

Iterative GDHP-based Approximate Optimal Tracking Control for a Class of Discrete-Time Nonlinear Systems

Kernel-Based Visual Tracking with Continuous Adaptive Distribution

Time Delay System Tracking Control Based on ADP Iterative Algorithm

On Convergence of Tracking Differentiator with Multiple Stochastic Disturbances

Object Tracking Algorithm with Sparse Prototype Based on TLD