Abstract:In reinforcement learning (RL), agents learn policies from spatiotemporal data generated by interaction with the environment. However, spatiotemporal data containing reward signals are often sparse, the agent only gets a reward signal when it reaches the goal state. This kind of problem is technically challenging since rewards are the basis of policy learning. Recently, the prior knowledge of the task structure has been used to solve such problems such as hierarchical learning and goal-conditioned learning. In this paper, we consider a common task structure called Hard-Transiting. In a Hard-Transiting task, the difficulty of moving forward of the agent increases as it approaches the goal state. We formalize the Hard-Transiting RL problem with sparse rewards in which the transition probability decreases as the agent approaches the goal state. Inspired by reward bonus in interactive spatiotemporal data, we propose two novel algorithms with efficient exploration to solve such problems. For tabular setting, we adopt Transition Exploratory Bonus (TEB) to encourage exploration in Hard-Transiting problems and propose Model-Based Interval Estimation-TEB (MBIE-TEB) in which TEB is considered in the value iteration phase of the conventional MBIE algorithm. And under the performance metric of sample complexity, we give a theoretical proof of the upper bound of MBIE-TEB. For non-tabular setting, we propose Deep Q-Network-TEB (DQN-TEB) in which TEB is used as the intrinsic motivation in DQN. We test the proposed algorithms on two numerical tasks and one large-scale task. And the experimental results demonstrate that with the transition exploratory bonus, the proposed algorithms outperform the compared algorithms.

Leveraging Transition Exploratory Bonus for Efficient Exploration in Hard-Transiting Reinforcement Learning Problems.

Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

A Temporally Correlated Latent Exploration for Reinforcement Learning

Reward Uncertainty for Exploration in Preference-based Reinforcement Learning

BeBold: Exploration Beyond the Boundary of Explored Regions

MADE: Exploration via Maximizing Deviation from Explored Regions

Bounded Exploration with World Model Uncertainty in Soft Actor-Critic Reinforcement Learning Algorithm

LiFE:Deep Exploration Via Linear-Feature Bonus in Continuous Control

Planning with Exploration: Addressing Dynamics Bottleneck in Model-based Reinforcement Learning

DQN with model-based exploration: efficient learning on environments with sparse rewards

Exploration by Maximizing Renyi Entropy for Reward-Free RL Framework.

Dynamic Subgoal-based Exploration via Bayesian Optimization

Incorporating Explanations to Balance the Exploration and Exploitation of Deep Reinforcement Learning.

Influence-Based Multi-Agent Exploration

Reward-Based Exploration: Adaptive Control for Deep Reinforcement Learning

Is Exploration All You Need? Effective Exploration Characteristics for Transfer in Reinforcement Learning

Efficient and Scalable Exploration Via Estimation-Error

Preference-Guided Reinforcement Learning for Efficient Exploration

ACDER: Augmented Curiosity-Driven Experience Replay

Learning Off-policy with Model-based Intrinsic Motivation For Active Online Exploration