LLaMA Rider: Spurring Large Language Models to Explore the Open World

Yicheng Feng,Yuxuan Wang,Jiazheng Liu,Sipeng Zheng,Zongqing Lu
2023-10-13
Abstract:Recently, various studies have leveraged Large Language Models (LLMs) to help decision-making and planning in environments, and try to align the LLMs' knowledge with the world conditions. Nonetheless, the capacity of LLMs to continuously acquire environmental knowledge and adapt in an open world remains uncertain. In this paper, we propose an approach to spur LLMs to explore the open world, gather experiences, and learn to improve their task-solving capabilities. In this approach, a multi-round feedback-revision mechanism is utilized to encourage LLMs to actively select appropriate revision actions guided by feedback information from the environment. This facilitates exploration and enhances the model's performance. Besides, we integrate sub-task relabeling to assist LLMs in maintaining consistency in sub-task planning and help the model learn the combinatorial nature between tasks, enabling it to complete a wider range of tasks through training based on the acquired exploration experiences. By evaluation in Minecraft, an open-ended sandbox world, we demonstrate that our approach LLaMA-Rider enhances the efficiency of the LLM in exploring the environment, and effectively improves the LLM's ability to accomplish more tasks through fine-tuning with merely 1.3k instances of collected data, showing minimal training costs compared to the baseline using reinforcement learning.
Machine Learning
What problem does this paper attempt to address?
The paper primarily focuses on the application of large language models (LLMs) in open-world environments, particularly on how these models can continuously acquire new knowledge through exploring the environment and improve their problem-solving abilities. Specifically, the paper attempts to address the following core issues: 1. **Environmental Adaptability and Knowledge Update**: Although current LLMs possess powerful capabilities, their knowledge mainly comes from the corpus used during the pre-training phase, which may lead to discrepancies between the LLMs' knowledge and the actual conditions of specific environments. Therefore, the paper seeks to address how to enable LLMs to continuously adjust and update their knowledge based on environmental feedback. 2. **Handling Complex Tasks**: In open environments like Minecraft, tasks are often very complex, involving multiple sub-tasks and requiring precise execution of each step. Additionally, due to the high degree of freedom, there are many potential invalid actions. The paper explores how to enable LLMs to effectively explore these complex environments and complete related tasks. 3. **Multi-task and Generalization Ability**: The paper also focuses on how to enable LLMs not only to complete specific tasks but also to handle multiple tasks and generalize the learned knowledge to new tasks. To address the above challenges, the paper proposes a method called LLaMA-Rider, which includes two stages: the exploration stage and the learning stage. In the exploration stage, a multi-round feedback revision mechanism encourages LLMs to autonomously explore the environment and collect successful experiences. In the learning stage, supervised fine-tuning (SFT) is used to train the collected experiences to enhance the LLMs' task-solving abilities. Through experimental evaluation on the Minecraft simulation platform MineDojo, the paper demonstrates the effectiveness of the LLaMA-Rider method, proving that even when trained on a relatively small dataset of only 1.3k instances, the method can significantly improve the LLMs' ability to explore the environment and complete tasks. Additionally, the experimental results show that this method helps improve the model's generalization ability on new tasks.