RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation

Qingyao Li,Wei Xia,Kounianhua Du,Xinyi Dai,Ruiming Tang,Yasheng Wang,Yong Yu,Weinan Zhang
2024-09-15
Abstract:LLM agents enhanced by tree search algorithms have yielded notable performances in code generation. However, current search algorithms in this domain suffer from low search quality due to several reasons: 1) Ineffective design of the search space for the high-reasoning demands of code generation tasks, 2) Inadequate integration of code feedback with the search algorithm, and 3) Poor handling of negative feedback during the search, leading to reduced search efficiency and quality. To address these challenges, we propose to search for the reasoning process of the code and use the detailed feedback of code execution to refine erroneous thoughts during the search. In this paper, we introduce RethinkMCTS, which employs the Monte Carlo Tree Search (MCTS) algorithm to conduct thought-level searches before generating code, thereby exploring a wider range of strategies. More importantly, we construct verbal feedback from fine-grained code execution feedback to refine erroneous thoughts during the search. This ensures that the search progresses along the correct reasoning paths, thus improving the overall search quality of the tree by leveraging execution feedback. Through extensive experiments, we demonstrate that RethinkMCTS outperforms previous search-based and feedback-based code generation baselines. On the HumanEval dataset, it improves the pass@1 of GPT-3.5-turbo from 70.12 to 89.02 and GPT-4o-mini from 87.20 to 94.51. It effectively conducts more thorough exploration through thought-level searches and enhances the search quality of the entire tree by incorporating rethink operation.
Software Engineering,Computation and Language
What problem does this paper attempt to address?
The problems that this paper attempts to solve are some key challenges in the search algorithms of existing code - generation tasks. Specifically, these challenges include: 1. **Insufficient Modeling and Exploration of the Reasoning Process**: In code - generation tasks, previous search methods did not fully model and explore the reasoning process before writing code. Although studies such as Chain - of - Thought and Tree of Thoughts have shown the importance of explicitly representing the reasoning process for the success of reasoning tasks, previous code - generation work did not clearly model the relationship between reasoning and code. 2. **Insufficient Effective Integration of Code - Execution Feedback**: Unlike other reasoning tasks, code generation can benefit from the detailed feedback provided by the compiler. However, in early search algorithms, this feedback was usually simply incorporated into the subsequent generation process, usually by storing the feedback in memory. This method is rather rudimentary and fails to effectively use the feedback to identify and correct errors. 3. **Poor Error Handling in the Search Process**: When encountering poor evaluation results during the search process, previous methods usually adopted two strategies: one is self - reflection, summarizing experience and incorporating it into the context of subsequent exploration; the other is directly pruning the search tree to improve efficiency. However, neither of these two methods has significantly improved the search quality. Although self - reflection provides a summary of past errors, the wrong operations still exist in the exploration path, causing subsequent searches to continue along the wrong track. And although pruning improves efficiency, it may also discard potentially valuable paths. To solve the above problems, the paper proposes **RethinkMCTS**, that is, performing a thought - level search before generating code through the Monte Carlo Tree Search (MCTS) algorithm, and using the detailed feedback of code execution to correct wrong thoughts, thereby improving the quality of the entire search tree. Specifically, the main contributions of RethinkMCTS include: - **A Reasoning - to - Code Search Framework for Code Generation**: This framework uses the methods of multi - step thinking and single - step code generation, and combines verbal and scoring feedback to guide the generation of the MCTS tree. To the best of the authors' knowledge, this is the first attempt to search and optimize the thinking process of code to enhance the performance of large - language models (LLMs) in code generation. - **Introducing Verbal Feedback to Correct Thinking Errors in MCTS**: By introducing verbal feedback, RethinkMCTS can discover and correct errors in the current thinking process instead of continuing along the wrong path. - **Introducing Detailed Feedback and Dual Evaluation to Correct Errors**: Using detailed feedback to locate block - level errors and guide the correction of wrong thinking. In addition, a dual - evaluation method is proposed - combining visible tests and LLM evaluation - to ensure effective code selection, especially in cases where visible tests alone are not sufficient for evaluation. Through extensive experiments, the paper shows that RethinkMCTS outperforms previous search - and - feedback - based code - generation baseline methods on multiple benchmark datasets. For example, on the HumanEval dataset, RethinkMCTS improves the pass@1 of GPT - 3.5 - turbo from 70.12 to 89.02, and improves the pass@1 of GPT - 4o - mini from 87.20 to 94.51. This indicates that RethinkMCTS can more thoroughly explore different strategies through thought - level search and the utilization of feedback, thereby improving the quality of the entire search tree.