Abstract:Deep reinforcement learning (RL) works impressively in some environments and fails catastrophically in others. Ideally, RL theory should be able to provide an understanding of why this is, i.e. bounds predictive of practical performance. Unfortunately, current theory does not quite have this ability. We compare standard deep RL algorithms to prior sample complexity bounds by introducing a new dataset, BRIDGE. It consists of 155 deterministic MDPs from common deep RL benchmarks, along with their corresponding tabular representations, which enables us to exactly compute instance-dependent bounds. We choose to focus on deterministic environments because they share many interesting properties of stochastic environments, but are easier to analyze. Using BRIDGE, we find that prior bounds do not correlate well with when deep RL succeeds vs. fails, but discover a surprising property that does. When actions with the highest Q-values under the random policy also have the highest Q-values under the optimal policy (i.e. when it is optimal to be greedy on the random policy's Q function), deep RL tends to succeed; when they don't, deep RL tends to fail. We generalize this property into a new complexity measure of an MDP that we call the effective horizon, which roughly corresponds to how many steps of lookahead search would be needed in that MDP in order to identify the next optimal action, when leaf nodes are evaluated with random rollouts. Using BRIDGE, we show that the effective horizon-based bounds are more closely reflective of the empirical performance of PPO and DQN than prior sample complexity bounds across four metrics. We also find that, unlike existing bounds, the effective horizon can predict the effects of using reward shaping or a pre-trained exploration policy. Our code and data are available at <a class="link-external link-https" href="https://github.com/cassidylaidlaw/effective-horizon" rel="external noopener nofollow">this https URL</a>

Bridging Scenarios in Reinforcement Learning with Continuously Generated Relaying Predictive Models.

TRCC: Transferable Congestion Control with Reinforcement Learning

LoopSR: Looping Sim-and-Real for Lifelong Policy Adaptation of Legged Robots

Model-Based Transfer Reinforcement Learning Based on Graphical Model Representations

Enabling Multi-Agent Transfer Reinforcement Learning via Scenario Independent Representation

Efficient Deep Reinforcement Learning Via Adaptive Policy Transfer

Transfer Reinforcement Learning for Dynamic Spectrum Environment

Learning to Bridge the Gap: Efficient Novelty Recovery with Planning and Reinforcement Learning

Bridging Locomotion and Manipulation Using Reconfigurable Robotic Limbs via Reinforcement Learning

Federated Transfer Reinforcement Learning for Autonomous Driving

Real–Sim–Real Transfer for Real-World Robot Control Policy Learning with Deep Reinforcement Learning

Efficient Deep Reinforcement Learning Through Policy Transfer.

Parallel learner: A practical deep reinforcement learning framework for multi-scenario games

Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL

Constrained Reinforcement Learning Under Model Mismatch

REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy Transfer

Reactive power optimization via deep transfer reinforcement learning for efficient adaptation to multiple scenarios

A Platform-Agnostic Deep Reinforcement Learning Framework for Effective Sim2Real Transfer towards Autonomous Driving

Bridging RL Theory and Practice with the Effective Horizon

Bridging the simulation-to-real gap of depth images for deep reinforcement learning

Heuristically Accelerated Reinforcement Learning by Means of Case-Based Reasoning and Transfer Learning