Vulnerability detection in Java source code using a quantum convolutional neural network with self-attentive pooling, deep sequence, and graph-based hybrid feature extraction

Shumaila Hussain,Muhammad Nadeem,Junaid Baber,Mohammed Hamdi,Adel Rajab,Mana Saleh Al Reshan,Asadullah Shaikh

DOI: https://doi.org/10.1038/s41598-024-56871-z

IF: 4.6

2024-03-30

Scientific Reports

Abstract:Software vulnerabilities pose a significant threat to system security, necessitating effective automatic detection methods. Current techniques face challenges such as dependency issues, language bias, and coarse detection granularity. This study presents a novel deep learning-based vulnerability detection system for Java code. Leveraging hybrid feature extraction through graph and sequence-based techniques enhances semantic and syntactic understanding. The system utilizes control flow graphs (CFG), abstract syntax trees (AST), program dependencies (PD), and greedy longest-match first vectorization for graph representation. A hybrid neural network (GCN-RFEMLP) and the pre-trained CodeBERT model extract features, feeding them into a quantum convolutional neural network with self-attentive pooling. The system addresses issues like long-term information dependency and coarse detection granularity, employing intermediate code representation and inter-procedural slice code. To mitigate language bias, a benchmark software assurance reference dataset is employed. Evaluations demonstrate the system's superiority, achieving 99.2% accuracy in detecting vulnerabilities, outperforming benchmark methods. The proposed approach comprehensively addresses vulnerabilities, including improper input validation, missing authorizations, buffer overflow, cross-site scripting, and SQL injection attacks listed by common weakness enumeration (CWE).

multidisciplinary sciences

What problem does this paper attempt to address?

The paper aims to address several key issues in software vulnerability detection, particularly for Java source code. Current vulnerability detection techniques face some challenges, including dependency issues, language bias, and coarse-grained detection. This paper proposes a novel vulnerability detection system based on deep learning to address these issues through the following methods: 1. **Hybrid Feature Extraction**: Enhancing semantic and syntactic understanding by combining graph and sequence techniques, utilizing Control Flow Graph (CFG), Abstract Syntax Tree (AST), Program Dependency Graph (PD), and Greedy Longest Match Vectorization for graph representation. 2. **Quantum Convolutional Neural Network and Self-Attention Pooling**: Introducing Quantum Convolutional Neural Network (QCNN) and self-attention pooling mechanisms to improve long-term information dependency and fine-grained detection capabilities. 3. **Pre-trained Model**: Using the pre-trained CodeBERT model for feature extraction to reduce semantic gaps and improve the accuracy of vulnerability detection. 4. **Dataset Balancing**: Using the Software Assurance Reference Dataset (SARD) for model training and testing, and preprocessing the dataset to optimize results. Through these methods, the system can effectively detect various types of vulnerabilities, including improper input validation, SQL injection attacks, missing authorization, cross-site scripting attacks, and buffer overflow attacks. Experimental results show that the detection accuracy of this system reaches 99.2%, significantly outperforming existing benchmark methods.

Vulnerability detection in Java source code using a quantum convolutional neural network with self-attentive pooling, deep sequence, and graph-based hybrid feature extraction

Automated software vulnerability detection with machine learning

A new method of software vulnerability detection based on a quantum neural network

SQVDT: A Scalable Quantitative Vulnerability Detection Technique for Source Code Security Assessment.

Meta-heuristic-based hybrid deep learning model for vulnerability detection and prevention in software system

SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities

Software Vulnerability Mining and Analysis Based on Deep Learning

Systematic Analysis of Deep Learning Model for Vulnerable Code Detection

Vulnerability Detection Using Two-Stage Deep Learning Models

Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation

Software Vulnerability Detection Using Deep Neural Networks: A Survey

Vulnerability Detection in C/C++ Code with Deep Learning

Automated Vulnerability Detection Using Deep Learning Technique

JFinder: A Novel Architecture for Java Vulnerability Identification Based Quad Self-Attention and Pre-training Mechanism

Transformer-based Vulnerability Detection in Code at EditTime: Zero-shot, Few-shot, or Fine-tuning?

DeepVulSeeker: A novel vulnerability identification framework via code graph structure and pre-training mechanism

Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities

Binary Program Vulnerability Mining Based on Neural Network

Multi-context Attention Fusion Neural Network for Software Vulnerability Identification

An Unbiased Transformer Source Code Learning with Semantic Vulnerability Graph

Vulnerability Detection via Multiple-Graph-Based Code Representation