Abstract:Though many deep learning-based models have made great progress in vulnerability detection, we have no good understanding of these models, which limits the further advancement of model capability, understanding of the mechanism of model detection, and efficiency and safety of practical application of models. In this paper, we extensively and comprehensively investigate two types of state-of-the-art learning-based approaches (sequence-based and graph-based) by conducting experiments on a recently built large-scale dataset. We investigate seven research questions from five dimensions, namely model capabilities, model interpretation, model stability, ease of use of model, and model economy. We experimentally demonstrate the priority of sequence-based models and the limited abilities of both LLM (ChatGPT) and graph-based models. We explore the types of vulnerability that learning-based models skilled in and reveal the instability of the models though the input is subtlely semantical-equivalently changed. We empirically explain what the models have learned. We summarize the pre-processing as well as requirements for easily using the models. Finally, we initially induce the vital information for economically and safely practical usage of these models.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the following problems: 1. **Insufficient understanding of deep - learning models in vulnerability detection**: - Although many deep - learning - based models have made significant progress in vulnerability detection, our understanding of these models is still limited. This restricts further improvement of model capabilities, understanding of model detection mechanisms, and improvement of model efficiency and security in practical applications. 2. **Evaluating the performance of different types of deep - learning models in vulnerability detection**: - The paper experimentally compares two state - of - the - art learning methods (sequence - based models and graph - based models) and answers seven research questions, covering five dimensions: model capabilities, model interpretation, model stability, model usability, and model economy. 3. **Exploring the potential of large - language models (LLM) in vulnerability detection**: - The paper specifically focuses on the performance of large - language models (such as ChatGPT) in vulnerability detection and explores the performance changes under different prompt settings. 4. **Revealing the limitations and improvement directions of existing models**: - By analyzing the detection capabilities of different types of vulnerabilities, the paper reveals the advantages and disadvantages of existing models and provides guidance for future improvements. ### Specific research questions The paper proposes the following seven specific research questions (RQs) to comprehensively evaluate and understand learning - based vulnerability detection models: - **RQ - 1**: How do learning - based methods perform in vulnerability detection? What is the variability between different models? - **RQ - 2**: What types of vulnerabilities are learning - based methods good at detecting? - **RQ - 3**: Can large - language models (such as ChatGPT) detect vulnerabilities? - **RQ - 4**: What source - code information do learning - based models focus on? Do different types of learning - based models agree on similar code features? - **RQ - 5**: When the input changes slightly, do learning - based models consistently detect vulnerabilities? - **RQ - 6**: What efforts are required before using the models? In which scenarios can learning - based models be applied? - **RQ - 7**: From the time and economic perspectives, what costs will be incurred by adopting these models? ### Main contributions - **Extensive comparison**: The paper makes an extensive comparison of various learning - based vulnerability detection methods, including ChatGPT. - **Comprehensive research dimensions**: Seven research questions are designed, divided into five important dimensions, to comprehensively understand learning - based methods. - **Open - source reproduction package**: A reproduction package is released for further research. Through the answers to these questions, the paper provides valuable insights and guidance for better developing and applying vulnerability detection models.

Learning-based Models for Vulnerability Detection: An Extensive Study

Function-Level Vulnerability Detection Through Fusing Multi-Modal Knowledge

Ignnvd: A Novel Software Vulnerability Detection Model Based on Integrated Graph Neural Networks

Large Language Model for Vulnerability Detection: Emerging Results and Future Directions

An empirical study of text-based machine learning models for vulnerability detection

A Comparative Study of Deep Learning-Based Vulnerability Detection System

Understanding and Tackling Label Errors in Deep Learning-Based Vulnerability Detection (experience Paper).

Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability Detection

An extensive study of the effects of different deep learning models on code vulnerability detection in Python code

Explaining the Contributing Factors for Vulnerability Detection in Machine Learning

How Far Have We Gone in Vulnerability Detection Using Large Language Models

Revisiting the Performance of Deep Learning-Based Vulnerability Detection on Realistic Datasets

Vulnerability Detection with Code Language Models: How Far Are We?

VDDL: A Deep Learning-Based Vulnerability Detection Model for Smart Contracts.

Deep Learning based Vulnerability Detection: Are We There Yet?

DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection

VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models

A Survey on Automated Software Vulnerability Detection Using Machine Learning and Deep Learning

Toward Improved Deep Learning-based Vulnerability Detection

Vulnerability Detection with Graph Attention Network and Metric Learning