Jonathan Light,Yue Wu,Yiyou Sun,Wenchao Yu,Yanchi liu,Xujiang Zhao,Ziniu Hu,Haifeng Chen,Wei Cheng
Abstract:We propose a novel approach to scaling LLM inference for code generation. We frame code generation as a black box optimization problem within the code space, and employ optimization-inspired techniques to enhance exploration. Specifically, we introduce Scattered Forest Search to enhance solution diversity while searching for solutions. Our theoretical analysis illustrates how these methods avoid local optima during optimization. Extensive experiments on HumanEval, MBPP, APPS, CodeContests, and Leetcode reveal significant performance improvements. For instance, our method achieves a pass@1 rate of 67.1% on HumanEval+ and 87.2% on HumanEval with GPT-3.5, marking improvements of 8.6% and 4.3% over the state-of-the-art, while also halving the iterations needed to find the correct solution. Furthermore, our method scales more efficiently than existing search techniques, including tree search, line search, and repeated sampling.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to expand the reasoning ability of large - language models (LLM) in code - generation tasks more effectively to improve the accuracy and efficiency of code generation. Specifically, the author frames the code - generation problem as a black - box optimization problem in the code space and introduces optimization heuristic techniques to enhance exploration and avoid local optimal solutions.
### Problem Background
Currently, code - generation tasks mainly rely on repeated sampling (such as Best - of - N sampling) or tree - search - based methods. Although these methods are effective, they have the following shortcomings:
1. **Insufficient exploration**: Existing methods often produce similar solutions, resulting in insufficient exploration of the solution space.
2. **Prone to getting stuck in local optima**: Especially when using linear search, once the improvement in a certain direction is not satisfactory, it is difficult to return to the previous state.
3. **Low efficiency**: It takes multiple iterations to find the correct solution, and as the problem complexity increases, the efficiency drops significantly.
### Proposed Method
To overcome the above problems, the author proposes the **Scattered Forest Search (SFS)** method, which mainly includes three key techniques:
1. **Scattering**:
- By dynamically changing the input prompt, make the LLM generate more diverse outputs.
- At each branch, the LLM will propose different text - optimization directions and steps, similar to the gradient in numerical optimization.
2. **Foresting**:
- Similar to multi - start optimization, start tree - search from multiple random - seed codes.
- Ensure that the initial solutions are distributed throughout the search space, thereby enhancing the breadth of exploration.
3. **Scouting**:
- Inspired by ant - colony optimization and particle - swarm optimization, share the success or failure experiences of different branches.
- When a certain branch discovers the effectiveness of a specific direction, other branches can adjust their search strategies according to this information, thus using the feedback more efficiently.
### Experimental Results
The author conducted experiments on multiple code - generation benchmark datasets, including HumanEval, MBPP, APPS, CodeContests, and Leetcode. The experimental results show that:
- The SFS method significantly improves the pass@1 rate. For example, it reaches 67.1% on HumanEval +, which is 8.6% higher than the best existing method.
- SFS reduces the number of iterations required to find the correct solution, on average only half of the number of iterations are needed.
- The solutions generated by SFS are more diverse while maintaining a high validation score, indicating that it achieves a better balance between exploration and exploitation.
### Summary
The main contributions of this paper are:
1. Formalize the code - generation problem as a black - box optimization problem, emphasizing the balance between exploration and exploitation.
2. Propose the Scattered Forest Search (SFS) method, which combines Scattering, Foresting, and Scouting techniques to enhance the exploration efficiency in the code space.
3. Demonstrate the superior performance of the SFS method on multiple code - generation benchmarks, proving its effectiveness in improving code - generation accuracy, efficiency, and solution diversity.
Through these innovations, the paper provides new ideas and technical means for research in the field of code generation.