Experience Replay for Least-Squares Policy Iteration

Quan Liu,Xin Zhou,Fei Zhu,Qiming Fu,Yuchen Fu
DOI: https://doi.org/10.1109/JAS.2014.7004685
2014-01-01
IEEE/CAA Journal of Automatica Sinica
Abstract:Policy iteration, which evaluates and improves the control policy iteratively, is a reinforcement learning method. Policy evaluation with the least-squares method can draw more useful information from the empirical data and therefore improve the data validity. However, most existing online least-squares policy iteration methods only use each sample just once,resulting in the low utilization rate.With the goal of improving the utiliza-tion efficiency,we propose an experience replay for least-squares policy iteration (ERLSPI) and prove its convergence. ERLSPI method combines online least-squares policy iteration method with experience replay, stores the samples which are generated online, and reuses these samples with least-squares method to update the control policy. We apply the ERLSPI method for the inverted pendulum system,a typical benchmark testing.The experimental results show that the method can effectively take advantage of the previous experience and knowledge,improve the empirical utilization efficiency, and accelerate the convergence speed.
What problem does this paper attempt to address?