Online Hierarchical Reinforcement Learning Based on Interrupting Option

ZHU Fei,XU Zhi-peng,LIU Quan,FU Yu-chen,WANG Hui
DOI: https://doi.org/10.11959/j.issn.1000-436x.2016117
2016-01-01
Abstract:Aiming at dealing with volume of big data,an on-line updating algorithm,named by Macro-Q with in-place updating (MQIU),which was based on Macro-Q algorithm and takes advantage of in-place updating approach,was proposed.The MQIU algorithm updates both the value function of abstract action and the value function of primitive action,and hence speeds up the convergence rate.By introducing the interruption mechanism,a model-free interrupting Macro-Q Option learning algorithm(IMQ),which was based on hierarchical reinforcement learning,was also introduced to order to handle the variability which was hard to process by the conventional Markov decision process model and abstract action so that IMQ was able to learn and improve control strategies in a dynamic environment.Simulations verify the MQIU algorithm speeds up the convergence rate so that it is able to do with the larger scale of data,and the IMQ algorithm solves the task faster with a stable learning performance.
What problem does this paper attempt to address?