Abstract:Many machine learning systems are built to solve the hardest examples of a particular task, which often makes them large and expensive to run---especially with respect to the easier examples, which might require much less computation. For an agent with a limited computational budget, this "one-size-fits-all" approach may result in the agent wasting valuable computation on easy examples, while not spending enough on hard examples. Rather than learning a single, fixed policy for solving all instances of a task, we introduce a metacontroller which learns to optimize a sequence of "imagined" internal simulations over predictive models of the world in order to construct a more informed, and more economical, solution. The metacontroller component is a model-free reinforcement learning agent, which decides both how many iterations of the optimization procedure to run, as well as which model to consult on each iteration. The models (which we call "experts") can be state transition models, action-value functions, or any other mechanism that provides information useful for solving the task, and can be learned on-policy or off-policy in parallel with the metacontroller. When the metacontroller, controller, and experts were trained with "interaction networks" (Battaglia et al., 2016) as expert models, our approach was able to solve a challenging decision-making problem under complex non-linear dynamics. The metacontroller learned to adapt the amount of computation it performed to the difficulty of the task, and learned how to choose which experts to consult by factoring in both their reliability and individual computational resource costs. This allowed the metacontroller to achieve a lower overall cost (task loss plus computational cost) than more traditional fixed policy approaches. These results demonstrate that our approach is a powerful framework for using rich forward models for efficient model-based reinforcement learning.

Deep Reinforcement Learning Behavioral Mode Switching Using Optimal Control Based on a Latent Space Objective

Discovering Behavioral Modes in Deep Reinforcement Learning Policies Using Trajectory Clustering in Latent Space

Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards

Behavior Decision of Mobile Robot With a Neurophysiologically Motivated Reinforcement Learning Model

Specialized Deep Residual Policy Safe Reinforcement Learning-Based Controller for Complex and Continuous State-Action Spaces

Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models

Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning

Policy-Independent Behavioral Metric-Based Representation for Deep Reinforcement Learning

Continuously Discovering Novel Strategies Via Reward-Switching Policy Optimization.

Hybrid Reinforcement Learning for Optimal Control of Non-Linear Switching System

Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient

Metacontrol for Adaptive Imagination-Based Optimization

Reclaiming the Source of Programmatic Policies: Programmatic versus Latent Spaces

Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection

Latent Context Based Soft Actor-Critic

Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning

Adversarial Policy Optimization in Deep Reinforcement Learning

Mixed Reinforcement Learning for Efficient Policy Optimization in Stochastic Environments

Learning When to Switch: Composing Controllers to Traverse a Sequence of Terrain Artifacts

Model-Based Reinforcement Learning via Meta-Policy Optimization

A Safe Reinforcement Learning driven Weights-varying Model Predictive Control for Autonomous Vehicle Motion Control