Abstract:Many machine learning systems are built to solve the hardest examples of a particular task, which often makes them large and expensive to run---especially with respect to the easier examples, which might require much less computation. For an agent with a limited computational budget, this "one-size-fits-all" approach may result in the agent wasting valuable computation on easy examples, while not spending enough on hard examples. Rather than learning a single, fixed policy for solving all instances of a task, we introduce a metacontroller which learns to optimize a sequence of "imagined" internal simulations over predictive models of the world in order to construct a more informed, and more economical, solution. The metacontroller component is a model-free reinforcement learning agent, which decides both how many iterations of the optimization procedure to run, as well as which model to consult on each iteration. The models (which we call "experts") can be state transition models, action-value functions, or any other mechanism that provides information useful for solving the task, and can be learned on-policy or off-policy in parallel with the metacontroller. When the metacontroller, controller, and experts were trained with "interaction networks" (Battaglia et al., 2016) as expert models, our approach was able to solve a challenging decision-making problem under complex non-linear dynamics. The metacontroller learned to adapt the amount of computation it performed to the difficulty of the task, and learned how to choose which experts to consult by factoring in both their reliability and individual computational resource costs. This allowed the metacontroller to achieve a lower overall cost (task loss plus computational cost) than more traditional fixed policy approaches. These results demonstrate that our approach is a powerful framework for using rich forward models for efficient model-based reinforcement learning.

Is Imitation All You Need? Generalized Decision-Making with Dual-Phase Training

Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence

Configurable Mirror Descent: Towards a Unification of Decision Making

Think Before You Act: Decision Transformers with Working Memory

Universal embodied intelligence: learning from crowd, recognizing the world, and reinforced with experience

Multi-Game Decision Transformers

MA-Dreamer: Coordination and communication through shared imagination

Multi-Task Multi-Agent Shared Layers are Universal Cognition of Multi-Agent Coordination

LookALike: Human Mimicry based collaborative decision making

Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement

DUMA: a Dual-Mind Conversational Agent with Fast and Slow Thinking

Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation

LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations

Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces

Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models

Learning to Look: Seeking Information for Decision Making via Policy Factorization

Metacontrol for Adaptive Imagination-Based Optimization

Competitive Multi-agent Deep Reinforcement Learning with Counterfactual Thinking

Sample-efficient Imitative Multi-token Decision Transformer for Real-world Driving

Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks