Automatic Discovery of Subgoals in Reinforcement Learning Using Unique-Dreiction Value

Chuan Shi,Rui Huang,Zhongzhi Shi
DOI: https://doi.org/10.1109/COGINF.2007.4341927
2007-01-01
Abstract:Option has proven useful in discovering hierarchical structure in reinforcement learning to fasten learning. The key problem of automatic option discovery is to find subgoals. Though approaches based on visiting-frequency have gained much research focuses, many of them fail to distinguish subgoals from their nearby states. Based on the action-restricted property of subgoals we find, subgoals can be regarded as the most matching action-restricted states in the paths. For the grid-world environment, the concept of unique-direction value embodying the action-restricted property is introduced to find the most matching action-restricted states. Experiment results prove that the proposed approach can find subgoals correctly and the Q-learning with options found speed up the learning greatly.
What problem does this paper attempt to address?