Abstract:Many machine learning systems are built to solve the hardest examples of a particular task, which often makes them large and expensive to run---especially with respect to the easier examples, which might require much less computation. For an agent with a limited computational budget, this "one-size-fits-all" approach may result in the agent wasting valuable computation on easy examples, while not spending enough on hard examples. Rather than learning a single, fixed policy for solving all instances of a task, we introduce a metacontroller which learns to optimize a sequence of "imagined" internal simulations over predictive models of the world in order to construct a more informed, and more economical, solution. The metacontroller component is a model-free reinforcement learning agent, which decides both how many iterations of the optimization procedure to run, as well as which model to consult on each iteration. The models (which we call "experts") can be state transition models, action-value functions, or any other mechanism that provides information useful for solving the task, and can be learned on-policy or off-policy in parallel with the metacontroller. When the metacontroller, controller, and experts were trained with "interaction networks" (Battaglia et al., 2016) as expert models, our approach was able to solve a challenging decision-making problem under complex non-linear dynamics. The metacontroller learned to adapt the amount of computation it performed to the difficulty of the task, and learned how to choose which experts to consult by factoring in both their reliability and individual computational resource costs. This allowed the metacontroller to achieve a lower overall cost (task loss plus computational cost) than more traditional fixed policy approaches. These results demonstrate that our approach is a powerful framework for using rich forward models for efficient model-based reinforcement learning.

Online meta-learning by parallel algorithm competition

Online Fast Adaptation and Knowledge Accumulation: a New Approach to Continual Learning

Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

Meta-Learning Adversarial Bandit Algorithms

Evolving Pareto-Optimal Actor-Critic Algorithms for Generalizability and Stability

Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning

Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL

Efficient Parallel Methods for Deep Reinforcement Learning

All by Myself: Learning Individualized Competitive Behaviour with a Contrastive Reinforcement Learning optimization

Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration

Meta-SAGE: Scale Meta-Learning Scheduled Adaptation with Guided Exploration for Mitigating Scale Shift on Combinatorial Optimization

Neural Auto-Curricula

Discovering Temporally-Aware Reinforcement Learning Algorithms

Scalable Online Planning via Reinforcement Learning Fine-Tuning

Metacontrol for Adaptive Imagination-Based Optimization

Meta Learning Black-Box Population-Based Optimizers

Deep reinforcement learning algorithm based on multi-agent parallelism and its application in game environment

Algorithm Design for Online Meta-Learning with Task Boundary Detection

Evolution with Opponent-Learning Awareness

Distributed Evolution Strategies with Multi-Level Learning for Large-Scale Black-Box Optimization

Online Tuning for Offline Decentralized Multi-Agent Reinforcement Learning