Large Language Model for Vulnerability Detection and Repair: Literature Review and the Road Ahead

Xin Zhou,Sicong Cao,Xiaobing Sun,David Lo
2024-10-07
Abstract:The significant advancements in Large Language Models (LLMs) have resulted in their widespread adoption across various tasks within Software Engineering (SE), including vulnerability detection and repair. Numerous studies have investigated the application of LLMs to enhance vulnerability detection and repair tasks. Despite the increasing research interest, there is currently no existing survey that focuses on the utilization of LLMs for vulnerability detection and repair. In this paper, we aim to bridge this gap by offering a systematic literature review of approaches aimed at improving vulnerability detection and repair through the utilization of LLMs. The review encompasses research work from leading SE, AI, and Security conferences and journals, encompassing 43 papers published across 25 distinct venues, along with 15 high-quality preprint papers, bringing the total to 58 papers. By answering three key research questions, we aim to (1) summarize the LLMs employed in the relevant literature, (2) categorize various LLM adaptation techniques in vulnerability detection, and (3) classify various LLM adaptation techniques in vulnerability repair. Based on our findings, we have identified a series of limitations of existing studies. Additionally, we have outlined a roadmap highlighting potential opportunities that we believe are pertinent and crucial for future research endeavors.
Software Engineering
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to fill a gap in existing research, specifically the lack of a systematic literature review focused on the application of large language models (LLMs) in the field of software vulnerability detection and repair. Specifically, the goals of the paper include: 1. **Summarizing existing research findings**: Providing a comprehensive summary of the literature related to the use of LLMs for vulnerability detection and repair, covering 58 high-quality research papers published from 2018 to 2024. 2. **Classifying technical methods**: Detailed classification and description of various technical methods adapted for LLMs to perform vulnerability detection and repair, including fine-tuning, prompt engineering, retrieval augmentation, etc. 3. **Identifying limitations of existing research**: Analyzing the shortcomings of current research, pointing out issues in datasets, model design, deployment strategies, and other aspects. 4. **Proposing future research directions**: Based on the limitations of existing research, proposing future research directions and opportunities to guide subsequent studies. ### Background Information - **Software vulnerabilities**: Defects or weaknesses in software systems that can be exploited by attackers. - **Limitations of traditional methods**: Traditional methods for vulnerability detection and repair (such as rule-based detectors or program analysis-based repair tools) suffer from high false positive rates and inability to handle various types of vulnerabilities. - **Advantages of LLMs**: Large language models, through large-scale pre-training, can automatically learn the characteristics of known vulnerabilities and be used to detect and repair unknown vulnerabilities, showing great potential. ### Research Methods - **Literature collection**: Combining manual and automated searches to collect relevant literature from top conferences and journals. - **Study selection**: Screening high-quality papers that meet the research objectives based on strict inclusion and exclusion criteria. - **Quality assessment**: Scoring the selected papers using five quality assessment criteria to ensure the reliability and validity of the research. ### Main Contributions - **Systematic review**: Providing a systematic review of 58 major studies, covering the application of LLMs in vulnerability detection and repair. - **Technical classification**: Detailed classification of various technical methods adapted for LLMs, providing clear references for researchers. - **Limitation analysis**: In-depth analysis of the limitations of existing research, pointing out directions for future research. - **Future research roadmap**: Proposing future research opportunities and directions, providing guidance for further development in the field. Through these efforts, this paper not only fills the gap in existing research but also provides valuable information and guidance for future studies.