Hierarchical Reinforcement Learning Based on System Model

ZHENG Yu,LUO Si-wei,LU Zi-ang
DOI: https://doi.org/10.3969/j.issn.1673-0291.2006.05.001
2006-01-01
Abstract:This paper elaborates on the low learning efficiency in reinforcement learning due to improper generalization and random exploration policy under deterministic MDPS and proposes a hierarchical reinforcement learning algorithm based on system model.The algorithm adopts the two-lay structure.The low-layer selects the action by the greed policy and the high-layer detects and analyses the state value in the state space,guide the learning of low-layer,corrects the wrong the action selected by low-layer.The high-layer role includes the following: decrease the effect of state value convergence due to the improper generalization by setting the different learning parameters for the state value update in the state space;built the control rule in the state space and accelerate the learning rate by select action according to control rule;reduce the exploration of uncontrollable state space and non-optimal actions and limits the exploration concentrate on the controllable space.The proposed algorithm in this paper can achieve control quickly.Simulation results for the control of double inverted pendulum are presented to show the effectiveness of the proposed algorithm.
What problem does this paper attempt to address?