Abstract:Automated detection of software vulnerabilities is a fundamental problem in software security. Existing program analysis techniques either suffer from high false positives or false negatives. Recent progress in Deep Learning (DL) has resulted in a surge of interest in applying DL for automated vulnerability detection. Several recent studies have demonstrated promising results achieving an accuracy of up to 95% at detecting vulnerabilities. In this paper, we ask, "how well do the state-of-the-art DL-based techniques perform in a real-world vulnerability prediction scenario?". To our surprise, we find that their performance drops by more than 50%. A systematic investigation of what causes such precipitous performance drop reveals that existing DL-based vulnerability prediction approaches suffer from challenges with the training data (e.g., data duplication, unrealistic distribution of vulnerable classes, etc.) and with the model choices (e.g., simple token-based models). As a result, these approaches often do not learn features related to the actual cause of the vulnerabilities. Instead, they learn unrelated artifacts from the dataset (e.g., specific variable/function names, etc.). Leveraging these empirical findings, we demonstrate how a more principled approach to data collection and model design, based on realistic settings of vulnerability prediction, can lead to better solutions. The resulting tools perform significantly better than the studied baseline: up to 33.57% boost in precision and 128.38% boost in recall compared to the best performing model in the literature. Overall, this paper elucidates existing DL-based vulnerability prediction systems' potential issues and draws a roadmap for future DL-based vulnerability prediction research. In that spirit, we make available all the artifacts supporting our results: <a class="link-external link-https" href="https://git.io/Jf6IA" rel="external noopener nofollow">this https URL</a>.

Vulnerability Detection via Topological Analysis of Attention Maps

Function-Level Vulnerability Detection Through Fusing Multi-Modal Knowledge

Toward Improved Deep Learning-based Vulnerability Detection

Deep Learning based Vulnerability Detection: Are We There Yet?

VDDL: A Deep Learning-Based Vulnerability Detection Model for Smart Contracts.

Meta-Path Based Attentional Graph Learning Model for Vulnerability Detection

Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection

Learning-based Models for Vulnerability Detection: An Extensive Study

Can An Old Fashioned Feature Extraction and A Light-weight Model Improve Vulnerability Type Identification Performance?

VulANalyzeR: Explainable Binary Vulnerability Detection with Multi-task Learning and Attentional Graph Convolution

Robust Backdoor Detection for Deep Learning via Topological Evolution Dynamics

Reliable Malware Analysis and Detection using Topology Data Analysis

An extensive study of the effects of different deep learning models on code vulnerability detection in Python code

VDDA: An Effective Software Vulnerability Detection Model Based on Deep Learning and Attention Mechanism

Vulnerability Detection Using Two-Stage Deep Learning Models

Distinguishing Look-Alike Innocent and Vulnerable Code by Subtle Semantic Representation Learning and Explanation

Vulnerability Detection in C/C++ Code with Deep Learning

On Security Weaknesses and Vulnerabilities in Deep Learning Systems

Experimental Observations of the Topology of Convolutional Neural Network Activations

The Vulnerability Is in the Details: Locating Fine-grained Information of Vulnerable Code Identified by Graph-based Detectors

A Comparative Study of Deep Learning-Based Vulnerability Detection System