Abstract:Hierarchical reinforcement learning excels at dividing difficult task goals into easily achievable subgoals. It provides an effective means to solve long-horizon planning tasks that are trapped in high-dimensional complex environments. However, because it is challenging to train multiple levels of policies simultaneously, hierarchical reinforcement learning often suffers from the training non-stationary problem. Existing work analyzing the training non-stationary problem focuses on the noisy data created by the changes in low-level policy, which makes the high-level policy with aleatoric uncertainty. But the uncertain factors leading to the instability of high-level policy training are manifold. First, the randomness of the environments also generates noise in the high-level replay buffer, forming aleatoric uncertainty. Second, the limited transitions due to the agent’s insufficient exploration ability constitute the high-level policy’s epistemic uncertainty. In this paper, we first comprehensively examine the causes of the instability of hierarchical reinforcement learning training to address the uncertainty of high-level policy networks. On this basis, we propose uncertainty-aware hierarchical reinforcement learning (UAHRL), a novel framework to solve long-horizon tasks with stable learning. UAHRL constructs an action uncertainty estimation network based on deep ensembles to capture both uncertainties. The calculated uncertainties are then considered in the high-level training process to reduce non-stationary phenomena. The experiment results demonstrate that UAHRL outperforms the state-of-the-art hierarchical reinforcement learning algorithms in terms of sampling efficiency while also performing better on a series of long-horizon tasks with continuous action and state space.

Uncertainty Quantification for Operators in Online Reinforcement Learning

Gradient Q : A Unified Algorithm with Function Approximation for Reinforcement Learning

Fundamental Limits of Reinforcement Learning in Environment with Endogeneous and Exogeneous Uncertainty

Uncertainty-Aware Low-Rank Q-Matrix Estimation for Deep Reinforcement Learning

Online Algorithms with Uncertainty-Quantified Predictions

Uncertainty Modified Policy for Multi-Agent Reinforcement Learning

Risk Aversion Operator for Addressing Maximization Bias in Q-Learning.

Robust Optimization for Quantum Reinforcement Learning Control Using Partial Observations

Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks

Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles

UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function in Offline Reinforcement Learning

Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model

MQES: Max-Q Entropy Search for Efficient Exploration in Continuous Reinforcement Learning

Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness

Uncertainty-aware hierarchical reinforcement learning for long-horizon tasks

Scalable Uncertainty Quantification for Deep Operator Networks using Randomized Priors

Controlling Underestimation Bias in Reinforcement Learning Via Minmax Operation

Provably efficient exploration in quantum reinforcement learning with logarithmic worst-case regret

Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning

Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning

The Uncertainty Bellman Equation and Exploration