Deep Reinforcement Learning with Hierarchical Structures.

Siyuan Li

DOI: https://doi.org/10.24963/ijcai.2021/681

2021-01-01

Abstract:Hierarchical reinforcement learning (HRL), which enables control at multiple time scales, is a promising paradigm to solve challenging and long-horizon tasks. In this paper, we briefly introduce our work in bottom-up and top-down HRL and outline the directions for future work.

What problem does this paper attempt to address?

Advances in Hierarchical Reinforcement Learning

CHENG Xiao-bei,SHEN Jing,LIU Hai-bo,GU Guo-chang,ZHANG Guo-yin

DOI: https://doi.org/10.3778/j.issn.1002-8331.2008.13.001

2008-01-01

Abstract:Reinforcement learning is an approach that an agent can learn its behaviors through trial-and-error interaction with a dynamic environment.It has been an important branch of machine learning for its self-learning and online learning capabilities. But reinforcement learning is bedeviled by the curse of dimensionality.Recently,Hierarchical Reinforcement Learning(HRL) has made great progresses to combat the curse of dimensionality.And the HRL approaches have been being applied to multi-agent system.The recent advances in HRL are surveyed in this paper.Then,some open problems are discussed.Finally,the HRL prospects are shown.
Hierarchical Reinforcement Learning: A Survey and Open Research Challenges

Matthias Hutsebaut-Buysse,Kevin Mets,Steven Latré

DOI: https://doi.org/10.3390/make4010009

2022-02-17

Machine Learning and Knowledge Extraction

Abstract:Reinforcement learning (RL) allows an agent to solve sequential decision-making problems by interacting with an environment in a trial-and-error fashion. When these environments are very complex, pure random exploration of possible solutions often fails, or is very sample inefficient, requiring an unreasonable amount of interaction with the environment. Hierarchical reinforcement learning (HRL) utilizes forms of temporal- and state-abstractions in order to tackle these challenges, while simultaneously paving the road for behavior reuse and increased interpretability of RL systems. In this survey paper we first introduce a selection of problem-specific approaches, which provided insight in how to utilize often handcrafted abstractions in specific task settings. We then introduce the Options framework, which provides a more generic approach, allowing abstractions to be discovered and learned semi-automatically. Afterwards we introduce the goal-conditional approach, which allows sub-behaviors to be embedded in a continuous space. In order to further advance the development of HRL agents, capable of simultaneously learning abstractions and how to use them, solely from interaction with complex high dimensional environments, we also identify a set of promising research directions.
Summarize of hierarchical reinforcement learning

Wenji ZHOU,Yang YU

DOI: https://doi.org/10.11992/tis.201706031

2017-01-01

Abstract:Reinforcement Learning ( RL) is an important research area in the field of machine learning and artificial intelligence and has received increasing attentions in recent years. The goal in RL is to maximize long-term total reward by interacting with the environment. Traditional RL algorithms are limited due to the so-called curse of dimensionality, and their learning abilities degrade drastically with increases in the dimensionality of the state space. Hierarchical reinforcement learning ( HRL) decomposes the RL problem into sub-problems and solves each of them to improve learning ability. HRL offers a potential way to solve large-scale RL, which has received insufficient attention to date. In this paper, we introduce and review several main HRL methods.
Exploring the limits of Hierarchical World Models in Reinforcement Learning

Robin Schiewer,Anand Subramoney,Laurenz Wiskott

2024-06-02

Abstract:Hierarchical model-based reinforcement learning (HMBRL) aims to combine the benefits of better sample efficiency of model based reinforcement learning (MBRL) with the abstraction capability of hierarchical reinforcement learning (HRL) to solve complex tasks efficiently. While HMBRL has great potential, it still lacks wide adoption. In this work we describe a novel HMBRL framework and evaluate it thoroughly. To complement the multi-layered decision making idiom characteristic for HRL, we construct hierarchical world models that simulate environment dynamics at various levels of temporal abstraction. These models are used to train a stack of agents that communicate in a top-down manner by proposing goals to their subordinate agents. A significant focus of this study is the exploration of a static and environment agnostic temporal abstraction, which allows concurrent training of models and agents throughout the hierarchy. Unlike most goal-conditioned H(MB)RL approaches, it also leads to comparatively low dimensional abstract actions. Although our HMBRL approach did not outperform traditional methods in terms of final episode returns, it successfully facilitated decision making across two levels of abstraction using compact, low dimensional abstract actions. A central challenge in enhancing our method's performance, as uncovered through comprehensive experimentation, is model exploitation on the abstract level of our world model stack. We provide an in depth examination of this issue, discussing its implications for the field and suggesting directions for future research to overcome this challenge. By sharing these findings, we aim to contribute to the broader discourse on refining HMBRL methodologies and to assist in the development of more effective autonomous learning systems for complex decision-making environments.

Machine Learning
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Tejas D. Kulkarni,Karthik R. Narasimhan,Ardavan Saeedi,Joshua B. Tenenbaum

DOI: https://doi.org/10.48550/arXiv.1604.06057

IF: 5.414

2016-04-20

Machine Learning

Abstract:Learning goal-directed behavior in environments with sparse feedback is a major challenge for reinforcement learning algorithms. The primary difficulty arises due to insufficient exploration, resulting in an agent being unable to learn robust value functions. Intrinsically motivated agents can explore new behavior for its own sake rather than to directly solve problems. Such intrinsic behaviors could eventually help the agent solve tasks posed by the environment. We present hierarchical-DQN (h-DQN), a framework to integrate hierarchical value functions, operating at different temporal scales, with intrinsically motivated deep reinforcement learning. A top-level value function learns a policy over intrinsic goals, and a lower-level function learns a policy over atomic actions to satisfy the given goals. h-DQN allows for flexible goal specifications, such as functions over entities and relations. This provides an efficient space for exploration in complicated environments. We demonstrate the strength of our approach on two problems with very sparse, delayed feedback: (1) a complex discrete stochastic decision process, and (2) the classic ATARI game `Montezuma's Revenge'.
Hierarchical Reinforcement Learning in Complex 3D Environments

Bernardo Avila Pires,Feryal Behbahani,Hubert Soyer,Kyriacos Nikiforou,Thomas Keck,Satinder Singh

DOI: https://doi.org/10.48550/arXiv.2302.14451

2023-02-28

Abstract:Hierarchical Reinforcement Learning (HRL) agents have the potential to demonstrate appealing capabilities such as planning and exploration with abstraction, transfer, and skill reuse. Recent successes with HRL across different domains provide evidence that practical, effective HRL agents are possible, even if existing agents do not yet fully realize the potential of HRL. Despite these successes, visually complex partially observable 3D environments remained a challenge for HRL agents. We address this issue with Hierarchical Hybrid Offline-Online (H2O2), a hierarchical deep reinforcement learning agent that discovers and learns to use options from scratch using its own experience. We show that H2O2 is competitive with a strong non-hierarchical Muesli baseline in the DeepMind Hard Eight tasks and we shed new light on the problem of learning hierarchical agents in complex environments. Our empirical study of H2O2 reveals previously unnoticed practical challenges and brings new perspective to the current understanding of hierarchical agents in complex domains.

Machine Learning,Artificial Intelligence
Causality-driven Hierarchical Structure Discovery for Reinforcement Learning

Shaohui Peng,Xing Hu,Rui Zhang,Ke Tang,Jiaming Guo,Qi Yi,Ruizhi Chen,Xishan Zhang,Zidong Du,Ling Li,Qi Guo,Yunji Chen

DOI: https://doi.org/10.48550/arXiv.2210.06964

2022-10-13

Abstract:Hierarchical reinforcement learning (HRL) effectively improves agents' exploration efficiency on tasks with sparse reward, with the guide of high-quality hierarchical structures (e.g., subgoals or options). However, how to automatically discover high-quality hierarchical structures is still a great challenge. Previous HRL methods can hardly discover the hierarchical structures in complex environments due to the low exploration efficiency by exploiting the randomness-driven exploration paradigm. To address this issue, we propose CDHRL, a causality-driven hierarchical reinforcement learning framework, leveraging a causality-driven discovery instead of a randomness-driven exploration to effectively build high-quality hierarchical structures in complicated environments. The key insight is that the causalities among environment variables are naturally fit for modeling reachable subgoals and their dependencies and can perfectly guide to build high-quality hierarchical structures. The results in two complex environments, 2D-Minecraft and Eden, show that CDHRL significantly boosts exploration efficiency with the causality-driven paradigm.

Machine Learning
Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?

Ofir Nachum,Haoran Tang,Xingyu Lu,Shixiang Gu,Honglak Lee,Sergey Levine

DOI: https://doi.org/10.48550/arXiv.1909.10618

2019-12-31

Abstract:Hierarchical reinforcement learning has demonstrated significant success at solving difficult reinforcement learning (RL) tasks. Previous works have motivated the use of hierarchy by appealing to a number of intuitive benefits, including learning over temporally extended transitions, exploring over temporally extended periods, and training and exploring in a more semantically meaningful action space, among others. However, in fully observed, Markovian settings, it is not immediately clear why hierarchical RL should provide benefits over standard "shallow" RL architectures. In this work, we isolate and evaluate the claimed benefits of hierarchical RL on a suite of tasks encompassing locomotion, navigation, and manipulation. Surprisingly, we find that most of the observed benefits of hierarchy can be attributed to improved exploration, as opposed to easier policy learning or imposed hierarchical structures. Given this insight, we present exploration techniques inspired by hierarchy that achieve performance competitive with hierarchical RL while at the same time being much simpler to use and implement.

Machine Learning,Artificial Intelligence
Temporal-adaptive Hierarchical Reinforcement Learning

Zhou Wen-Ji,Yu Yang

2020-01-01

Abstract: Hierarchical reinforcement learning (HRL) helps address large-scale and sparse reward issues in reinforcement learning. In HRL, the policy model has an inner representation structured in levels. With this structure, the reinforcement learning task is expected to be decomposed into corresponding levels with sub-tasks, and thus the learning can be more efficient. In HRL, although it is intuitive that a high-level policy only needs to make macro decisions in a low frequency, the exact frequency is hard to be simply determined. Previous HRL approaches often employed a fixed-time skip strategy or learn a terminal condition without taking account of the context, which, however, not only requires manual adjustments but also sacrifices some decision granularity. In this paper, we propose the \emph{temporal-adaptive hierarchical policy learning} (TEMPLE) structure, which uses a temporal gate to adaptively control the high-level policy decision frequency. We train the TEMPLE structure with PPO and test its performance in a range of environments including 2-D rooms, Mujoco tasks, and Atari games. The results show that the TEMPLE structure can lead to improved performance in these environments with a sequential adaptive high-level control.
Deep Reinforcement Learning from Hierarchical Preference Design

Alexander Bukharin,Yixiao Li,Pengcheng He,Tuo Zhao

2024-06-10

Abstract:Reward design is a fundamental, yet challenging aspect of reinforcement learning (RL). Researchers typically utilize feedback signals from the environment to handcraft a reward function, but this process is not always effective due to the varying scale and intricate dependencies of the feedback signals. This paper shows by exploiting certain structures, one can ease the reward design process. Specifically, we propose a hierarchical reward modeling framework -- HERON for scenarios: (I) The feedback signals naturally present hierarchy; (II) The reward is sparse, but with less important surrogate feedback to help policy learning. Both scenarios allow us to design a hierarchical decision tree induced by the importance ranking of the feedback signals to compare RL trajectories. With such preference data, we can then train a reward model for policy learning. We apply HERON to several RL applications, and we find that our framework can not only train high performing agents on a variety of difficult tasks, but also provide additional benefits such as improved sample efficiency and robustness. Our code is available at \url{<a class="link-external link-https" href="https://github.com/abukharin3/HERON" rel="external noopener nofollow">this https URL</a>}.

Machine Learning,Artificial Intelligence
Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes

Le Pham Tuyen,Ngo Anh Vien,Abu Layek,TaeChoong Chung

DOI: https://doi.org/10.1109/ACCESS.2018.2854283

2018-05-11

Abstract:In recent years, reinforcement learning has achieved many remarkable successes due to the growing adoption of deep learning techniques and the rapid growth in computing power. Nevertheless, it is well-known that flat reinforcement learning algorithms are often not able to learn well and data-efficient in tasks having hierarchical structures, e.g. consisting of multiple subtasks. Hierarchical reinforcement learning is a principled approach that is able to tackle these challenging tasks. On the other hand, many real-world tasks usually have only partial observability in which state measurements are often imperfect and partially observable. The problems of RL in such settings can be formulated as a partially observable Markov decision process (POMDP). In this paper, we study hierarchical RL in POMDP in which the tasks have only partial observability and possess hierarchical properties. We propose a hierarchical deep reinforcement learning approach for learning in hierarchical POMDP. The deep hierarchical RL algorithm is proposed to apply to both MDP and POMDP learning. We evaluate the proposed algorithm on various challenging hierarchical POMDP.

Artificial Intelligence
Hierarchical Reinforcement Learning Algorithm Based on Structural State-Space

MENG Jiang-hua,ZHU Ji-hong,SUN Zeng-qi

DOI: https://doi.org/10.3321/j.issn:1001-0920.2007.02.024

2007-01-01

Abstract:In terms of structural state-space, the complex MDP problem is divided into a set of simple MDP or SMDP problems hierarchically according to dimensionality of the state-space.The hierarchically structure is improved in learning process.It forms different hierarchical reinforcement learning(RL) method by adopting different RL algorithm.While having merits of higher speed and knowledge transferring,the proposed algorithm depends less on aforehand knowledge and can weaken the curse of reward lack in the beginning of learning process.
Diversity-Driven Extensible Hierarchical Reinforcement Learning.

Yuhang Song,Jianyi Wang,Thomas Lukasiewicz,Zhenghua Xu,Mai Xu

DOI: https://doi.org/10.1609/aaai.v33i01.33014992

2019-01-01

Proceedings of the AAAI Conference on Artificial Intelligence

Abstract:Hierarchical reinforcement learning (HRL) has recently shown promising advances on speeding up learning, improving the exploration, and discovering intertask transferable skills. Most recent works focus on HRL with two levels, i.e., a master policy manipulates subpolicies, which in turn manipulate primitive actions. However, HRL with multiple levels is usually needed in many real-world scenarios, whose ultimate goals are highly abstract, while their actions are very primitive. Therefore, in this paper, we propose a diversity-driven extensible HRL (DEHRL), where an extensible and scalable framework is built and learned levelwise to realize HRL with multiple levels. DEHRL follows a popular assumption: diverse subpolicies are useful, i.e., subpolicies are believed to be more useful if they are more diverse. However, existing implementations of this diversity assumption usually have their own drawbacks, which makes them inapplicable to HRL with multiple levels. Consequently, we further propose a novel diversity-driven solution to achieve this assumption in DEHRL. Experimental studies evaluate DEHRL with nine baselines from four perspectives in two domains; the results show that DEHRL outperforms the state-of-the-art baselines in all four aspects.
HLifeRL: A Hierarchical Lifelong Reinforcement Learning Framework

Fan Ding,Fei Zhu

DOI: https://doi.org/10.1016/j.jksuci.2022.05.001

IF: 9.006

2022-01-01

Journal of King Saud University - Computer and Information Sciences

Abstract:Deep reinforcement learning research in a single-task environment has made remarkable achievements. However, it is often plagued by catastrophic forgetting, prohibitively low sample efficiency and lack of scalability problems when facing multi-task environment. To solve these issues, a Hierarchical Lifelong Reinforcement Learning framework (HLifeRL) is proposed to enhance the ability of agents to deal with a sequence of tasks in the way of skill discovery (we treat option as low-level skill in this paper) and hierarchical policy. HLifeRL can automatically extract task-related knowledge without any human intervention or priori knowledge. Moreover, with the help of a scalable library and the master policy, we can flexibly combine various skills to complete multiple tasks in the form of call-and-return. The experimental results show that HLifeRL can accelerate the speed of single-task training and deliver remarkable stability along with scalability in a lifelong setting environment.
Boosting Reinforcement Learning via Hierarchical Game Playing With State Relay

Chanjuan Liu,Jinmiao Cong,Guangyuan Liu,Guifei Jiang,Xirong Xu,Enqiang Zhu

DOI: https://doi.org/10.1109/TNNLS.2024.3386717

2024-04-22

Abstract:Due to its wide application, deep reinforcement learning (DRL) has been extensively studied in the motion planning community in recent years. However, in the current DRL research, regardless of task completion, the state information of the agent will be reset afterward. This leads to a low sample utilization rate and hinders further explorations of the environment. Moreover, in the initial training stage, the agent has a weak learning ability in general, which affects the training efficiency in complex tasks. In this study, a new hierarchical reinforcement learning (HRL) framework dubbed hierarchical learning based on game playing with state relay (HGR) is proposed. In particular, we introduce an auxiliary penalty to regulate task difficulty, and one training mechanism, the state relay mechanism, is designed. The relay mechanism can make full use of the intermediate states of the agent and expand the environment exploration of low-level policy. Our algorithm can improve the sample utilization rate, reduce the sparse reward problem, and thereby enhance the training performance in complex environments. Simulation tests are carried out on two public experiment platforms, i.e., MazeBase and MuJoCo, to verify the effectiveness of the proposed method. The results show that HGR significantly benefits the reinforcement learning (RL) area.
Learning Representations in Model-Free Hierarchical Reinforcement Learning

Jacob Rafati,David C. Noelle

DOI: https://doi.org/10.48550/arXiv.1810.10096

2019-04-13

Abstract:Common approaches to Reinforcement Learning (RL) are seriously challenged by large-scale applications involving huge state spaces and sparse delayed reward feedback. Hierarchical Reinforcement Learning (HRL) methods attempt to address this scalability issue by learning action selection policies at multiple levels of temporal abstraction. Abstraction can be had by identifying a relatively small set of states that are likely to be useful as subgoals, in concert with the learning of corresponding skill policies to achieve those subgoals. Many approaches to subgoal discovery in HRL depend on the analysis of a model of the environment, but the need to learn such a model introduces its own problems of scale. Once subgoals are identified, skills may be learned through intrinsic motivation, introducing an internal reward signal marking subgoal attainment. In this paper, we present a novel model-free method for subgoal discovery using incremental unsupervised learning over a small memory of the most recent experiences (trajectories) of the agent. When combined with an intrinsic motivation learning mechanism, this method learns both subgoals and skills, based on experiences in the environment. Thus, we offer an original approach to HRL that does not require the acquisition of a model of the environment, suitable for large-scale applications. We demonstrate the efficiency of our method on two RL problems with sparse delayed feedback: a variant of the rooms environment and the first screen of the ATARI 2600 Montezuma's Revenge game.

Artificial Intelligence,Machine Learning,Optimization and Control
Hierarchical Reinforcement Learning for Temporal Pattern Prediction

Faith Johnson,Kristin Dana

DOI: https://doi.org/10.48550/arXiv.2310.05695

IF: 5.414

2023-10-09

Machine Learning

Abstract:In this work, we explore the use of hierarchical reinforcement learning (HRL) for the task of temporal sequence prediction. Using a combination of deep learning and HRL, we develop a stock agent to predict temporal price sequences from historical stock price data and a vehicle agent to predict steering angles from first person, dash cam images. Our results in both domains indicate that a type of HRL, called feudal reinforcement learning, provides significant improvements to training speed and stability and prediction accuracy over standard RL. A key component to this success is the multi-resolution structure that introduces both temporal and spatial abstraction into the network hierarchy.
Past Data-Driven Adaptation in Hierarchical Reinforcement Learning

Sijie Zhang,Aiguo Chen,Tianzi Wang,Xincen Zhou

DOI: https://doi.org/10.1145/3651671.3651714

2024-01-01

Abstract:Reinforcement learning algorithms struggle with tasks that have complex hierarchical dependency structures. For this problem, humans usually represent the whole task in a structured way and solve it layer by layer. In this paper, we propose a novel approach called Past Data-Driven Adaptation in Hierarchical Reinforcement Learning (AdaHRL). AdaHRL leverages ’past samples’ from a replay buffer to discover subgoals and construct a subgoal tree, effectively steering the agent’s learning trajectory. Simultaneously, AdaHRL fine-tunes the data distribution of the entire replay buffer using a filter function, empowering adaptive learning within the agent. Experimental results demonstrate that our approach outperforms Unified Model-Free HRL Framework (UHRL) and Hindsight experience replay (HER) in tasks with complex hierarchical dependencies.
A Brief Review of Recent Hierarchical Reinforcement Learning for Robotic Manipulation

Shuang Liu,Tao Tang,Wenzhuo Zhang,Jiabao Cui,Xin Xu

DOI: https://doi.org/10.1109/isctech58360.2022.00119

2022-01-01

Abstract:Deep reinforcement learning (DRL) has become a popular learning paradigm for decision and control and has been widely applied in robot manipulation in recent years. However, due to its special learning pattern of “trial and error”, there are still some remaining problems with DRL. Such as exploration dilemma, sample inefficient, and slow convergence, are to be refined, especially when faced with complex long-horizon tasks. As a solution for these limits, hierarchical reinforcement learning (HRL) is proposed and developed by decomposing challenging tasks into multiple simpler subtasks, to efficiently solve the main task in a“divide and conquer” manner. At present, there are comprehensive HRL methods for robotic manipulation tasks, while a review is lacking. To facilitate researchers to form a general view of this field, we systematically summarize related HRL methods for robotic manipulation. This review carries out literature sortation in a novel taxonomy of subtask generation and divides HRL methods into two categories: handcrafted subtask generation and learning-based subtask generation. A great number of representative methods are analyzed in detail. In the end, we also present some important future directions for HRL.

Deep Reinforcement Learning with Hierarchical Structures.

Advances in Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning: A Survey and Open Research Challenges

Summarize of hierarchical reinforcement learning

Exploring the limits of Hierarchical World Models in Reinforcement Learning

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Hierarchical Reinforcement Learning in Complex 3D Environments

Causality-driven Hierarchical Structure Discovery for Reinforcement Learning

Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?

Temporal-adaptive Hierarchical Reinforcement Learning

Deep Reinforcement Learning from Hierarchical Preference Design

Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes

Hierarchical Reinforcement Learning Algorithm Based on Structural State-Space

Diversity-Driven Extensible Hierarchical Reinforcement Learning.

HLifeRL: A Hierarchical Lifelong Reinforcement Learning Framework

Boosting Reinforcement Learning via Hierarchical Game Playing With State Relay

Learning Representations in Model-Free Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning for Temporal Pattern Prediction

Past Data-Driven Adaptation in Hierarchical Reinforcement Learning

A Brief Review of Recent Hierarchical Reinforcement Learning for Robotic Manipulation