Abstract:Recently, various studies have leveraged Large Language Models (LLMs) to help decision-making and planning in environments, and try to align the LLMs' knowledge with the world conditions. Nonetheless, the capacity of LLMs to continuously acquire environmental knowledge and adapt in an open world remains uncertain. In this paper, we propose an approach to spur LLMs to explore the open world, gather experiences, and learn to improve their task-solving capabilities. In this approach, a multi-round feedback-revision mechanism is utilized to encourage LLMs to actively select appropriate revision actions guided by feedback information from the environment. This facilitates exploration and enhances the model's performance. Besides, we integrate sub-task relabeling to assist LLMs in maintaining consistency in sub-task planning and help the model learn the combinatorial nature between tasks, enabling it to complete a wider range of tasks through training based on the acquired exploration experiences. By evaluation in Minecraft, an open-ended sandbox world, we demonstrate that our approach LLaMA-Rider enhances the efficiency of the LLM in exploring the environment, and effectively improves the LLM's ability to accomplish more tasks through fine-tuning with merely 1.3k instances of collected data, showing minimal training costs compared to the baseline using reinforcement learning.

What problem does this paper attempt to address?

The paper primarily focuses on the application of large language models (LLMs) in open-world environments, particularly on how these models can continuously acquire new knowledge through exploring the environment and improve their problem-solving abilities. Specifically, the paper attempts to address the following core issues: 1. **Environmental Adaptability and Knowledge Update**: Although current LLMs possess powerful capabilities, their knowledge mainly comes from the corpus used during the pre-training phase, which may lead to discrepancies between the LLMs' knowledge and the actual conditions of specific environments. Therefore, the paper seeks to address how to enable LLMs to continuously adjust and update their knowledge based on environmental feedback. 2. **Handling Complex Tasks**: In open environments like Minecraft, tasks are often very complex, involving multiple sub-tasks and requiring precise execution of each step. Additionally, due to the high degree of freedom, there are many potential invalid actions. The paper explores how to enable LLMs to effectively explore these complex environments and complete related tasks. 3. **Multi-task and Generalization Ability**: The paper also focuses on how to enable LLMs not only to complete specific tasks but also to handle multiple tasks and generalize the learned knowledge to new tasks. To address the above challenges, the paper proposes a method called LLaMA-Rider, which includes two stages: the exploration stage and the learning stage. In the exploration stage, a multi-round feedback revision mechanism encourages LLMs to autonomously explore the environment and collect successful experiences. In the learning stage, supervised fine-tuning (SFT) is used to train the collected experiences to enhance the LLMs' task-solving abilities. Through experimental evaluation on the Minecraft simulation platform MineDojo, the paper demonstrates the effectiveness of the LLaMA-Rider method, proving that even when trained on a relatively small dataset of only 1.3k instances, the method can significantly improve the LLMs' ability to explore the environment and complete tasks. Additionally, the experimental results show that this method helps improve the model's generalization ability on new tasks.

LLaMA Rider: Spurring Large Language Models to Explore the Open World

AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft

Grounding Large Language Models In Embodied Environment With Imperfect World Models

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World Agents

LLM-State: Open World State Representation for Long-horizon Task Planning with Large Language Model

World Models with Hints of Large Language Models for Goal Achieving

Large Language Models as Minecraft Agents

Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents.

Words as Beacons: Guiding RL Agents with High-Level Language Prompts

Language Models Meet World Models: Embodied Experiences Enhance Language Models

Introspective Tips: Large Language Model for In-Context Decision Making

Enhancing Agent Learning through World Dynamics Modeling

LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments

Enabling Efficient Interaction between an Algorithm Agent and an LLM: A Reinforcement Learning Approach

Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information

Leveraging Environment Interaction for Automated PDDL Translation and Planning with Large Language Models

Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods

From Laws to Motivation: Guiding Exploration through Law-Based Reasoning and Rewards

From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs