Efficient Reinforcement-Learning Control Algorithm Using Experience Reuse

Hao Chuan-chuan,Fang Zhou,Li Ping
DOI: https://doi.org/10.3969/j.issn.1000-565X.2012.06.012
2012-01-01
Abstract:Though eNAC(episodic Natural Actor-Critic) algorithm,an episode-based reinforcement learning control algorithm,is theoretically of excellent learning performance,it is inefficient in learning because many episodes are required to obtain a good policy.In order to solve this problem,a new algorithm named ER-eNAC,which introduces the episode reuse mechanism in eNAC algorithm,is proposed.In ER-eNAC,some of the past episodes are reused in the estimation procedure of current natural policy gradient for the purpose of using the experience more efficiently,and the reused episodes are weighted in an exponential decay according to the number of policy updates that they have undergone for the purpose of describing their fitness to the current policy.The proposed algorithm is then applied to the inverted pendulum control.Simulated results show that,as compared with eNAC algorithm,ER-eNAC algorithm is more effective because it significantly reduces the number of episodes for learning and remarkably improves the learning efficiency.
What problem does this paper attempt to address?