Historical Temporal Difference Learning: Some Initial Results

Hengshuai Yao,Diao Dongcui,Zengqi Sun
DOI: https://doi.org/10.1109/IMSCCS.2006.231
2006-01-01
Abstract:In this paper, we develop a multi-step prediction algorithm that is guaranteed to converge when using general function approximation. Besides, the new algorithm should satisfy the following requirements: First, it does not have to be faster than TD(0) in the look-up table representation; however, the new algorithm should be faster than residual gradient method. Second, the new algorithm should learn optimally.
What problem does this paper attempt to address?