RoT: Enhancing Large Language Models with Reflection on Search Trees

Wenyang Hui,Kewei Tu
2024-07-18
Abstract:Large language models (LLMs) have demonstrated impressive capability in reasoning and planning when integrated with tree-search-based prompting methods. However, since these methods ignore the previous search experiences, they often make the same mistakes in the search process. To address this issue, we introduce Reflection on search Trees (RoT), an LLM reflection framework designed to improve the performance of tree-search-based prompting methods. It uses a strong LLM to summarize guidelines from previous tree search experiences to enhance the ability of a weak LLM. The guidelines are instructions about solving this task through tree search which can prevent the weak LLMs from making similar mistakes in the past search process. In addition, we proposed a novel state selection method, which identifies the critical information from historical search processes to help RoT generate more specific and meaningful guidelines. In our extensive experiments, we find that RoT significantly improves the performance of LLMs in reasoning or planning tasks with various tree-search-based prompting methods (e.g., BFS and MCTS). Non-tree-search-based prompting methods such as Chain-of-Thought (CoT) can also benefit from RoT guidelines since RoT can provide task-specific knowledge collected from the search experience.
Computation and Language
What problem does this paper attempt to address?
### The Problem Addressed by the Paper The paper "RoT: Enhancing Large Language Models with Reflection on Search Trees" aims to address the issue of large language models (LLMs) repeatedly making mistakes in tree search methods. #### Background Problem Although existing tree search methods can significantly improve the performance of models in multi-step reasoning or planning tasks, they overlook past search experiences, leading to repeated mistakes during the search process. Specifically, these issues include: - Incorrectly evaluating actions. - Generating actions that lead to inefficient results. - Failing to accurately predict the next state. These problems result in low accuracy and poor search efficiency, causing the model to over-explore erroneous action paths. #### Solution To address the above issues, the authors introduce a new framework—**Reflection on Search Trees (RoT)**. The main goal of RoT is to improve the performance of tree search methods by reflecting on past search experiences. Specifically, RoT uses a powerful LLM to summarize guiding principles from past search processes and applies these principles to enhance a weaker LLM, thereby avoiding repeated mistakes and improving decision-making capabilities. #### Key Techniques 1. **Important State Selection**: Selecting key states from the generated search tree that have a significant impact on the final result. 2. **Guiding Principle Generation**: Generating specific guiding principles based on the selected important states to help the model make better decisions in future search processes. 3. **Iterative Improvement**: Gradually optimizing the search tree and guiding principles through multiple applications of RoT, further enhancing the model's performance. #### Experimental Validation The authors evaluated the effectiveness of RoT on several complex reasoning and planning tasks, including: - **Blocksworld**: Manipulating blocks to reach a target state from an initial state. - **GSM8k**: Mathematical reasoning tasks. - **CraigslistBargain**: Bargaining tasks between buyers and sellers. Experimental results show that RoT significantly improves the performance of various LLMs in these tasks, especially as the task difficulty increases, the effect of RoT becomes more pronounced. ### Summary RoT effectively addresses the issue of LLMs repeatedly making mistakes in tree search methods by reflecting on past search experiences and generating guiding principles. This improves the model's search efficiency and accuracy. This method is not only applicable to tree search methods but can also enhance the performance of non-tree search methods (such as chain-of-thought).