Recursive least-squares TD (λ) learning algorithm based on improved extreme learning machine

Yuan XU,Bingming HUANG,Yanlin HE
DOI: https://doi.org/10.11949/j.issn.0438-1157.20161555
2017-01-01
Abstract:To meet the requirements on accuracy and computational time of value approximation algorithms, a recursive least-squares temporal difference reinforcement learning algorithm with eligibility traces based on improved extreme learning machine (RLSTD(λ)-IELM) was proposed. First, a recursive least-squares temporal difference reinforcement learning (RLSTD) was created by introducing recursive method into least-squares temporal difference reinforcement learning algorithm (LSTD), in order to eliminate matrix inversion process in least-squares algorithm and to reduce complexity and computation of the proposed algorithm. Then, eligibility trace was introduced into RLSTD algorithm to form the recursive least-squares temporal difference reinforcement learning algorithm with eligibility trace (RLSTD(λ)), in order to solve issues of slow convergence speed of LSTD(0) and low efficiency of experience exploitation. Furthermore, since value function in most reinforcement learning problem was monotonic, a single suppressed approximation Softplus function was used to replace sigmoid activation function in the extreme learning machine network in order to reduce computation load and improve computing speed. The experiment result on generalized Hop-world problem demonstrated that the proposed algorithm RLSTD(λ)-IELM had faster computing speed than the least-squares temporal difference learning algorithm based on extreme learning machine (LSTD-ELM), and better accuracy than the least-squares temporal difference learning algorithm based on radial basis functions (LSTD-RBF).
What problem does this paper attempt to address?