Abstract:Many machine learning systems are built to solve the hardest examples of a particular task, which often makes them large and expensive to run---especially with respect to the easier examples, which might require much less computation. For an agent with a limited computational budget, this "one-size-fits-all" approach may result in the agent wasting valuable computation on easy examples, while not spending enough on hard examples. Rather than learning a single, fixed policy for solving all instances of a task, we introduce a metacontroller which learns to optimize a sequence of "imagined" internal simulations over predictive models of the world in order to construct a more informed, and more economical, solution. The metacontroller component is a model-free reinforcement learning agent, which decides both how many iterations of the optimization procedure to run, as well as which model to consult on each iteration. The models (which we call "experts") can be state transition models, action-value functions, or any other mechanism that provides information useful for solving the task, and can be learned on-policy or off-policy in parallel with the metacontroller. When the metacontroller, controller, and experts were trained with "interaction networks" (Battaglia et al., 2016) as expert models, our approach was able to solve a challenging decision-making problem under complex non-linear dynamics. The metacontroller learned to adapt the amount of computation it performed to the difficulty of the task, and learned how to choose which experts to consult by factoring in both their reliability and individual computational resource costs. This allowed the metacontroller to achieve a lower overall cost (task loss plus computational cost) than more traditional fixed policy approaches. These results demonstrate that our approach is a powerful framework for using rich forward models for efficient model-based reinforcement learning.

Optimizing Attention and Cognitive Control Costs Using Temporally-Layered Architectures

Temporally Layered Architecture for Adaptive, Distributed and Continuous Control

Dynamic allocation of limited memory resources in reinforcement learning

Reinforcement Learning with Brain-Inspired Modulation can Improve Adaptation to Environmental Changes

Temporal Difference Models: Model-Free Deep RL for Model-Based Control

Metacontrol for Adaptive Imagination-Based Optimization

A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Observations

The Paradox of Choice: Using Attention in Hierarchical Reinforcement Learning

Lifelong Reinforcement Learning via Neuromodulation

Reinforcement learning when your life depends on it: a neuro-economic theory of learning

Reducing the Deployment-Time Inference Control Costs of Deep Reinforcement Learning Agents via an Asymmetric Architecture

Simplified Temporal Consistency Reinforcement Learning

Attention or memory? Neurointerpretable agents in space and time

Efficient Deep Reinforcement Learning with Predictive Processing Proximal Policy Optimization

Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach

Learning at Variable Attentional Load Requires Cooperation of Working Memory, Meta-learning, and Attention-augmented Reinforcement Learning

Behavior Decision of Mobile Robot With a Neurophysiologically Motivated Reinforcement Learning Model

Neural architecture impact on identifying temporally extended Reinforcement Learning tasks

Evolving hierarchical memory-prediction machines in multi-task reinforcement learning

Optimizing Agent Behavior over Long Time Scales by Transporting Value