Abstract:While automated vulnerability detection techniques have made promising progress in detecting security vulnerabilities, their scalability and applicability remain challenging. The remarkable performance of Large Language Models (LLMs), such as GPT-4 and CodeLlama, on code-related tasks has prompted recent works to explore if LLMs can be used to detect vulnerabilities. In this paper, we perform a more comprehensive study by concurrently examining a higher number of datasets, languages and LLMs, and qualitatively evaluating performance across prompts and vulnerability classes while addressing the shortcomings of existing tools. Concretely, we evaluate the effectiveness of 16 pre-trained LLMs on 5,000 code samples from five diverse security datasets. These balanced datasets encompass both synthetic and real-world projects in Java and C/C++ and cover 25 distinct vulnerability classes. Overall, LLMs across all scales and families show modest effectiveness in detecting vulnerabilities, obtaining an average accuracy of 62.8% and F1 score of 0.71 across datasets. They are significantly better at detecting vulnerabilities only requiring intra-procedural analysis, such as OS Command Injection and NULL Pointer Dereference. Moreover, they report higher accuracies on these vulnerabilities than popular static analysis tools, such as CodeQL. We find that advanced prompting strategies that involve step-by-step analysis significantly improve performance of LLMs on real-world datasets in terms of F1 score (by upto 0.18 on average). Interestingly, we observe that LLMs show promising abilities at performing parts of the analysis correctly, such as identifying vulnerability-related specifications and leveraging natural language information to understand code behavior (e.g., to check if code is sanitized). We expect our insights to guide future work on LLM-augmented vulnerability detection systems.

Automated Smart Contract Vulnerability Detection using Fine-tuned Large Language Models

Smart Contract Vulnerability Detection: The Role of Large Language Model (LLM)

Smart Contract Vulnerability Detection Technique: A Survey

Smart-LLaMA: Two-Stage Post-Training of Large Language Models for Smart Contract Vulnerability Detection and Explanation

VDDL: A Deep Learning-Based Vulnerability Detection Model for Smart Contracts.

LLM-SmartAudit: Advanced Smart Contract Vulnerability Detection

LLM4Fuzz: Guided Fuzzing of Smart Contracts with Large Language Models

Do you still need a manual smart contract audit?

Detection Made Easy: Potentials of Large Language Models for Solidity Vulnerabilities

Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives

Smart Contract Vulnerability Detection Based on Deep Learning and Multimodal Decision Fusion

Robust Vulnerability Detection in Solidity-Based Ethereum Smart Contracts Using Fine-Tuned Transformer Encoder Models

SmartLLMSentry: A Comprehensive LLM Based Smart Contract Vulnerability Detection Framework

A Smart Contract Vulnerability Detection Method Based on Multimodal Feature Fusion and Deep Learning

Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability Detection

Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities

Deep learning-based methodology for vulnerability detection in smart contracts

ConvMHSA-SCVD: Enhancing Smart Contract Vulnerability Detection Through a Knowledge-Driven and Data-Driven Framework

Retrieval Augmented Generation Integrated Large Language Models in Smart Contract Vulnerability Detection