Scalable Defect Detection via Traversal on Code Graph

Zhengyao Liu,Xitong Zhong,Xingjing Deng,Shuo Hong,Xiang Gao,Hailong Sun

2024-06-12

Abstract:Detecting defects and vulnerabilities in the early stage has long been a challenge in software engineering. Static analysis, a technique that inspects code without execution, has emerged as a key strategy to address this challenge. Among recent advancements, the use of graph-based representations, particularly Code Property Graph (CPG), has gained traction due to its comprehensive depiction of code structure and semantics. Despite the progress, existing graph-based analysis tools still face performance and scalability issues. The main bottleneck lies in the size and complexity of CPG, which makes analyzing large codebases inefficient and memory-consuming. Also, query rules used by the current tools can be over-specific. Hence, we introduce QVoG, a graph-based static analysis platform for detecting defects and vulnerabilities. It employs a compressed CPG representation to maintain a reasonable graph size, thereby enhancing the overall query efficiency. Based on the CPG, it also offers a declarative query language to simplify the queries. Furthermore, it takes a step forward to integrate machine learning to enhance the generality of vulnerability detection. For projects consisting of 1,000,000+ lines of code, QVoG can complete analysis in approximately 15 minutes, as opposed to 19 minutes with CodeQL.

Software Engineering

What problem does this paper attempt to address?

This paper aims to address the challenges of detecting defects and vulnerabilities in the early stages of software engineering. Specifically, existing graph - query - based analysis tools face performance and scalability issues when dealing with large - scale codebases. The main bottleneck lies in the size and complexity of the Code Property Graph (CPG), which makes the analysis process inefficient and memory - consuming. In addition, the query rules used by current tools may be too specific, resulting in insufficient generalization ability and thus false positives or false negatives. To solve these problems, the authors propose QVoG, a static analysis platform based on graph - query analysis for detecting defects and vulnerabilities. The main innovations of QVoG include: 1. **Compressed Code Property Graph**: Compress the structure of the CPG by retaining only the necessary information, reducing the number of nodes and edges, thereby improving query efficiency. 2. **Dedicated Domain - Specific Language**: Design a declarative DSL similar to SQL to simplify the writing of query rules. 3. **Language - independent Query Interface**: Provide a consistent query interface that supports multiple programming languages, reducing the cost of supporting new languages. 4. **Combination of Graph Query and Deep Learning**: Utilize machine learning to enhance the generalization ability of queries and improve detection accuracy. 5. **Open - source Tool**: QVoG will be fully open - source, unlike the partially closed - source components of CodeQL and Joern. Through these improvements, QVoG can exhibit higher efficiency and accuracy when handling large - scale projects. For example, for a project with 1,500,000 lines of code, QVoG can complete CPG extraction in approximately 15 minutes, while CodeQL requires 19 minutes and has much lower memory consumption than Joern. In terms of precision, QVoG has an average precision rate of 90% and a recall rate of 95% on the Juliet test suite, outperforming Joern and CodeQL.

Scalable Defect Detection via Traversal on Code Graph

HiddenCPG: Large-Scale Vulnerable Clone Detection Using Subgraph Isomorphism of Code Property Graphs

VulDetector: Detecting Vulnerabilities Using Weighted Feature Graph Comparison

Exploring Scalability of Value-Flow Graph Construction

Vu1SPG: Vulnerability Detection Based on Slice Property Graph Representation Learning

OdegVul: an Approach for Statement-Level Defect Prediction

Vulnerability Detection via Multiple-Graph-Based Code Representation

CSGVD: A deep learning approach combining sequence and graph embedding for source code vulnerability detection

Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation

HeVulD: A Static Vulnerability Detection Method Using Heterogeneous Graph Code Representation

PG-VulNet: Detect Supply Chain Vulnerabilities in IoT Devices Using Pseudo-code and Graphs

Accelerating High-Precision Vulnerability Detection in C Programs with Parallel Graph Summarization

E-GVD: Efficient Software Vulnerability Detection Techniques Based on Graph Neural Network

VulD-SG: Enhancing Code Vulnerability Detection Via Combining Deep Sequence and Graph Model

HGVul: A Code Vulnerability Detection Method Based on Heterogeneous Source-Level Intermediate Representation

Keep It Simple: Towards Accurate Vulnerability Detection for Large Code Graphs

A Static Detection Method for Code Defects Based on Transformer

GraphEye: A Novel Solution for Detecting Vulnerable Functions Based on Graph Attention Network

VulMiningBGS: Detection of Overflow Vulnerabilities Based on Graph Similarity

TACSan: Enhancing Vulnerability Detection with Graph Neural Network

A software vulnerability detection method based on deep learning with complex network analysis and subgraph partition