Search-Based LLMs for Code Optimization

Shuzheng Gao,Cuiyun Gao,Wenchao Gu,Michael Lyu
2024-08-22
Abstract:The code written by developers usually suffers from efficiency problems and contain various performance bugs. These inefficiencies necessitate the research of automated refactoring methods for code optimization. Early research in code optimization employs rule-based methods and focuses on specific inefficiency issues, which are labor-intensive and suffer from the low coverage issue. Recent work regards the task as a sequence generation problem, and resorts to deep learning (DL) techniques such as large language models (LLMs). These methods typically prompt LLMs to directly generate optimized code. Although these methods show state-of-the-art performance, such one-step generation paradigm is hard to achieve an optimal solution. First, complex optimization methods such as combinatorial ones are hard to be captured by LLMs. Second, the one-step generation paradigm poses challenge in precisely infusing the knowledge required for effective code optimization within LLMs, resulting in under-optimized <a class="link-external link-http" href="http://code.To" rel="external noopener nofollow">this http URL</a> address these problems, we propose to model this task from the search perspective, and propose a search-based LLMs framework named SBLLM that enables iterative refinement and discovery of improved optimization methods. SBLLM synergistically integrate LLMs with evolutionary search and consists of three key components: 1) an execution-based representative sample selection part that evaluates the fitness of each existing optimized code and prioritizes promising ones to pilot the generation of improved code; 2) an adaptive optimization pattern retrieval part that infuses targeted optimization patterns into the model for guiding LLMs towards rectifying and progressively enhancing their optimization methods; and 3) a genetic operator-inspired chain-of-thought prompting part that aids LLMs in combining different optimization methods and generating improved optimization methods.
Software Engineering,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The paper aims to address efficiency issues and performance errors in code optimization. Traditional methods rely on rule-based approaches to solve specific types of efficiency problems, but these methods are labor-intensive and have limited coverage. In recent years, there has been a growing body of research on using Large Language Models (LLMs) for code optimization. While these methods perform well in certain aspects, they struggle to capture complex optimization techniques due to the limitations of the one-shot generation paradigm. Additionally, it is challenging to precisely integrate the knowledge required for effective code optimization into LLMs, leading to suboptimal results. To address these issues, the paper proposes a new framework, SBLLM (Search-Based Large Language Models), which models the code optimization task from a search perspective. SBLLM combines LLMs with evolutionary search strategies and includes three main components: 1. **Execution-based Representative Sample Selection**: By evaluating the effectiveness of existing optimized code and prioritizing samples with efficient and unique optimization methods, it guides further optimization. 2. **Adaptive Optimization Pattern Retrieval**: An adaptive retrieval mechanism is proposed to inject domain knowledge into LLMs, guiding them to correct and gradually improve their optimization methods. 3. **Genetic Operator-Inspired Chain-of-Thought Prompting**: A Chain-of-Thought (COT) prompting method is introduced, utilizing crossover and mutation operations to assist LLMs in developing improved optimized code. Experimental results show that SBLLM significantly outperforms baseline methods in improving the efficiency of Python and C++ code. Specifically, program execution efficiency increased by up to 209.59%, and speedup rates improved by 8.75%~28.06% and 1.15%~9.56% under different LLMs compared to baseline methods. This demonstrates the effectiveness of SBLLM in enhancing code efficiency.