WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment

Hao Tang,Darren Key,Kevin Ellis
2024-09-21
Abstract:We give a model-based agent that builds a Python program representing its knowledge of the world based on its interactions with the environment. The world model tries to explain its interactions, while also being optimistic about what reward it can achieve. We define this optimism as a logical constraint between a program and a planner. We study our agent on gridworlds, and on task planning, finding our approach is more sample-efficient compared to deep RL, more compute-efficient compared to ReAct-style agents, and that it can transfer its knowledge across environments by editing its code.
Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper "WorldCoder: Code-Based World Model Agent" attempts to solve the following problems: 1. **Rapid Learning of World Models**: How to enable agents to quickly build a world model that can explain the observed data through minimal interaction with the environment. 2. **Knowledge Transfer**: How to allow agents to reuse existing knowledge across different tasks and environments to improve learning efficiency. 3. **Interpretability**: How to represent the world model in the form of code so that humans can understand and audit the agent's knowledge. 4. **Goal-Driven Exploration**: How to guide agents to explore effectively in tasks with uncertainty and sparse rewards through optimistic assumptions. ### Specific Problem Description - **Construction of World Models**: The paper proposes a code-based world model construction method, representing the agent's understanding of the world by writing Python programs. These programs can predict the next state and reward, thereby helping the agent plan actions. - **Sample Efficiency**: Compared to traditional Deep Reinforcement Learning (Deep RL), this method performs better in terms of sample efficiency, requiring fewer environmental interactions to learn an effective world model. - **Computational Efficiency**: Compared to language model-based agents like ReAct, this method has an advantage in computational efficiency because once the world model is constructed, subsequent action decisions no longer need to frequently call large language models (LLM). - **Knowledge Transfer**: By modifying the code, the agent can quickly adapt to different environments without needing to learn new tasks from scratch. - **Goal-Driven Exploration**: The paper introduces a new learning objective that encourages the agent to remain optimistic in tasks with uncertainty and sparse rewards, promoting goal-oriented exploration behavior. ### Experimental Validation - **Sokoban**: Experiments show that the agent can learn the basic rules of the Sokoban game in a short time and handle levels with more boxes. Compared to Deep Reinforcement Learning, this method significantly improves sample efficiency. - **Minigrid**: Through a series of experiments in Minigrid environments, the agent's ability to transfer knowledge between different tasks and its zero-shot generalization ability in new tasks were validated. - **AlfWorld**: In the AlfWorld robotic task planning environment, the agent demonstrated its learning ability in complex tasks and the effectiveness of its exploration strategies. ### Summary The paper "WorldCoder: Code-Based World Model Agent" addresses several key issues such as rapid learning, knowledge transfer, interpretability, and goal-driven exploration by constructing code-based world models, providing new insights for efficient learning of agents in complex tasks.