Abstract:We give a model-based agent that builds a Python program representing its knowledge of the world based on its interactions with the environment. The world model tries to explain its interactions, while also being optimistic about what reward it can achieve. We define this optimism as a logical constraint between a program and a planner. We study our agent on gridworlds, and on task planning, finding our approach is more sample-efficient compared to deep RL, more compute-efficient compared to ReAct-style agents, and that it can transfer its knowledge across environments by editing its code.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve The paper "WorldCoder: Code-Based World Model Agent" attempts to solve the following problems: 1. **Rapid Learning of World Models**: How to enable agents to quickly build a world model that can explain the observed data through minimal interaction with the environment. 2. **Knowledge Transfer**: How to allow agents to reuse existing knowledge across different tasks and environments to improve learning efficiency. 3. **Interpretability**: How to represent the world model in the form of code so that humans can understand and audit the agent's knowledge. 4. **Goal-Driven Exploration**: How to guide agents to explore effectively in tasks with uncertainty and sparse rewards through optimistic assumptions. ### Specific Problem Description - **Construction of World Models**: The paper proposes a code-based world model construction method, representing the agent's understanding of the world by writing Python programs. These programs can predict the next state and reward, thereby helping the agent plan actions. - **Sample Efficiency**: Compared to traditional Deep Reinforcement Learning (Deep RL), this method performs better in terms of sample efficiency, requiring fewer environmental interactions to learn an effective world model. - **Computational Efficiency**: Compared to language model-based agents like ReAct, this method has an advantage in computational efficiency because once the world model is constructed, subsequent action decisions no longer need to frequently call large language models (LLM). - **Knowledge Transfer**: By modifying the code, the agent can quickly adapt to different environments without needing to learn new tasks from scratch. - **Goal-Driven Exploration**: The paper introduces a new learning objective that encourages the agent to remain optimistic in tasks with uncertainty and sparse rewards, promoting goal-oriented exploration behavior. ### Experimental Validation - **Sokoban**: Experiments show that the agent can learn the basic rules of the Sokoban game in a short time and handle levels with more boxes. Compared to Deep Reinforcement Learning, this method significantly improves sample efficiency. - **Minigrid**: Through a series of experiments in Minigrid environments, the agent's ability to transfer knowledge between different tasks and its zero-shot generalization ability in new tasks were validated. - **AlfWorld**: In the AlfWorld robotic task planning environment, the agent demonstrated its learning ability in complex tasks and the effectiveness of its exploration strategies. ### Summary The paper "WorldCoder: Code-Based World Model Agent" addresses several key issues such as rapid learning, knowledge transfer, interpretability, and goal-driven exploration by constructing code-based world models, providing new insights for efficient learning of agents in complex tasks.

WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment

WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents

Toward Universal and Interpretable World Models for Open-ended Learning Agents

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search

Language-Guided World Models: A Model-Based Approach to AI Control

Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction

Agent Planning with World Knowledge Model

World Programs for Model-Based Learning and Planning in Compositional State and Action Spaces

Reward-Free Curricula for Training Robust World Models

ByteSized32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Games

Learning Knowledge Graph-based World Models of Textual Environments

Language Models Meet World Models: Embodied Experiences Enhance Language Models

Emergence of Implicit World Models from Mortal Agents

Adaptive and transparent decision-making in autonomous robots through graph-structured world models

APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World Agents

Generative World Explorer

World models and predictive coding for cognitive and developmental robotics: frontiers and challenges

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

Thinker: Learning to Plan and Act