Abstract:Deep reinforcement learning (RL) works impressively in some environments and fails catastrophically in others. Ideally, RL theory should be able to provide an understanding of why this is, i.e. bounds predictive of practical performance. Unfortunately, current theory does not quite have this ability. We compare standard deep RL algorithms to prior sample complexity bounds by introducing a new dataset, BRIDGE. It consists of 155 deterministic MDPs from common deep RL benchmarks, along with their corresponding tabular representations, which enables us to exactly compute instance-dependent bounds. We choose to focus on deterministic environments because they share many interesting properties of stochastic environments, but are easier to analyze. Using BRIDGE, we find that prior bounds do not correlate well with when deep RL succeeds vs. fails, but discover a surprising property that does. When actions with the highest Q-values under the random policy also have the highest Q-values under the optimal policy (i.e. when it is optimal to be greedy on the random policy's Q function), deep RL tends to succeed; when they don't, deep RL tends to fail. We generalize this property into a new complexity measure of an MDP that we call the effective horizon, which roughly corresponds to how many steps of lookahead search would be needed in that MDP in order to identify the next optimal action, when leaf nodes are evaluated with random rollouts. Using BRIDGE, we show that the effective horizon-based bounds are more closely reflective of the empirical performance of PPO and DQN than prior sample complexity bounds across four metrics. We also find that, unlike existing bounds, the effective horizon can predict the effects of using reward shaping or a pre-trained exploration policy. Our code and data are available at <a class="link-external link-https" href="https://github.com/cassidylaidlaw/effective-horizon" rel="external noopener nofollow">this https URL</a>

Exploiting Multiple Abstractions in Episodic RL via Reward Shaping

Extracting Heuristics from Large Language Models for Reward Shaping in Reinforcement Learning

Addressing Reward Engineering For Deep Reinforcement Learning On Multi-Stage Task

Exploring the limits of Hierarchical World Models in Reinforcement Learning

Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?

Hierarchical Reinforcement Learning from Demonstration via Reachability-Based Reward Shaping

Episodic Reinforcement Learning with Expanded State-reward Space

Hierarchical Reinforcement Learning: A Survey and Open Research Challenges

Deep Reinforcement Learning from Hierarchical Preference Design

Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Geometric Active Exploration in Markov Decision Processes: the Benefit of Abstraction

PROVABLY BENEFITS OF DEEP HIERARCHICAL RL

Graph learning-based generation of abstractions for reinforcement learning

Overcoming Exploration in Reinforcement Learning with Demonstrations

Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning Applications

Guiding Multi-agent Multi-task Reinforcement Learning by a Hierarchical Framework with Logical Reward Shaping

Bridging RL Theory and Practice with the Effective Horizon

Reward Shaping via Meta-Learning

Hierarchical reinforcement learning for efficient exploration and transfer

Dense Dynamics-Aware Reward Synthesis: Integrating Prior Experience with Demonstrations

Learning Representations in Model-Free Hierarchical Reinforcement Learning