Hierarchical reinforcement learning with unlimited option scheduling for sparse rewards in continuous spaces
Zhigang Huang,Quan Liu,Fei Zhu,Lihua Zhang,Lan Wu
DOI: https://doi.org/10.1016/j.eswa.2023.121467
IF: 8.5
2023-09-22
Expert Systems with Applications
Abstract:The fundamental concept behind option-based hierarchical reinforcement learning (O-HRL) is to obtain temporal coarse-grained actions and abstract complex situations. Although O-HRL is intended for sparse rewards, it remains difficult to extend it to sparse reward problems in continuous spaces. In this paper, we provide a fresh perspective on option technology to comprehend different options based on knowledge representation. The hierarchical reinforcement learning with the unlimited option scheduling (UOS) algorithm is proposed. Unlike conventional O-HRL algorithms that apply a limited set of options with specific meanings, UOS encourages an infinite number of options to correlate with trajectories while maintaining a correlation with each other, thus representing more abundant knowledge. These unlimited options can guide infinite and diverse trajectories to cover fine-grained state spaces. Further, a composite scheduling mode is proposed to generate arbitrary-length trajectories with intrinsic characteristics, providing both flexibility and concentration for unlimited options. It significantly improves the performance and robustness of UOS. Finally, a new comprehensive experimental system is developed, and the experimental results demonstrate the notable success of UOS on sparse reward tasks in continuous spaces. It also identifies the root cause of UOS superiority from the perspective of knowledge representation.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science