Abstract:The recursive least-squares (RLS) algorithm is one of the most well-known algorithms used in adaptive filtering, system identification and adaptive control. Its popularity is mainly due to its fast convergence speed, which is considered to be optimal in practice. In this paper, RLS methods are used to solve reinforcement learning problems, where two new reinforcement learning algorithms using linear value function approximators are proposed and analyzed. The two algorithms are called RLS-TD(lambda) and Fast-AHC (Fast Adaptive Heuristic Critic), respectively. RLS-TD(lambda) can be viewed as the extension of RLS-TD(0) from lambda=0 to general lambda within interval [0,1], so it is a multi-step temporal-difference (TD) learning algorithm using RLS methods. The convergence with probability one and the limit of convergence of RLS-TD(lambda) are proved for ergodic Markov chains. Compared to the existing LS-TD(lambda) algorithm, RLS-TD(lambda) has advantages in computation and is more suitable for online learning. The effectiveness of RLS-TD(lambda) is analyzed and verified by learning prediction experiments of Markov chains with a wide range of parameter settings. The Fast-AHC algorithm is derived by applying the proposed RLS-TD(lambda) algorithm in the critic network of the adaptive heuristic critic method. Unlike conventional AHC algorithm, Fast-AHC makes use of RLS methods to improve the learning-prediction efficiency in the critic. Learning control experiments of the cart-pole balancing and the acrobot swing-up problems are conducted to compare the data efficiency of Fast-AHC with conventional AHC. From the experimental results, it is shown that the data efficiency of learning control can also be improved by using RLS methods in the learning-prediction process of the critic. The performance of Fast-AHC is also compared with that of the AHC method using LS-TD(lambda). Furthermore, it is demonstrated in the experiments that different initial values of the variance matrix in RLS-TD(lambda) are required to get better performance not only in learning prediction but also in learning control. The experimental results are analyzed based on the existing theoretical work on the transient phase of forgetting factor RLS methods.

Recursive least-squares TD (λ) learning algorithm based on improved extreme learning machine

A Fast Incremental Method Based on Regularized Extreme Learning Machine

Efficient Reinforcement Learning Using Recursive Least-Squares Methods

A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning

Optimizing Extreme Learning Machine Via Generalized Hebbian Learning and Intrinsic Plasticity Learning

An Incremental Extreme Learning Machine for Online Sequential Learning Problems

Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator

Incremental regularized extreme learning machine and it's enhancement.

Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning

Off-Policy Training for Truncated TD(λ) Boosted Soft Actor-Critic.

Off-Policy Training for Truncated TD(\(\lambda \)) Boosted Soft Actor-Critic

An Enhanced Extreme Learning Machine Based on Square-Root Lasso Method

ELM-KL-LSTM: a robust and general incremental learning method for efficient classification of time series data

META-Learning Eligibility Traces for More Sample Efficient Temporal Difference Learning

A derived least square extreme learning machine

Learning to Mix n-Step Returns: Generalizing lambda-Returns for Deep Reinforcement Learning

Revisiting Recursive Least Squares for Training Deep Neural Networks

Modified Retrace for Off-Policy Temporal Difference Learning.

A Variance Minimization Approach to Temporal-Difference Learning

Controlling Estimation Error in Reinforcement Learning via Reinforced Operation

L1-PLS Based on Incremental Extreme Learning Machine