Abstract:Unlike the flow structure of natural languages, programming languages have an inherent rigidity in structure and <a class="link-external link-http" href="http://grammar.However" rel="external noopener nofollow">this http URL</a>, existing detection methods based on pre-trained models typically treat code as a natural language sequence, ignoring its unique structural information. This hinders the models from understanding the code's semantic and structual <a class="link-external link-http" href="http://information.To" rel="external noopener nofollow">this http URL</a> address this problem, we introduce the Code Structure-Aware Network through Line-level Semantic Learning (CSLS), which comprises four components: code preprocessing, global semantic awareness, line semantic awareness, and line semantic structure <a class="link-external link-http" href="http://awareness.The" rel="external noopener nofollow">this http URL</a> preprocessing step transforms the code into two types of text: global code text and line-level code <a class="link-external link-http" href="http://text.Unlike" rel="external noopener nofollow">this http URL</a> typical preprocessing methods, CSLS retains structural elements such as newlines and indent characters to enhance the model's perception of code lines during global semantic <a class="link-external link-http" href="http://awareness.For" rel="external noopener nofollow">this http URL</a> line semantics structure awareness, the CSLS network emphasizes capturing structural relationships between line <a class="link-external link-http" href="http://semantics.Different" rel="external noopener nofollow">this http URL</a> from the structural modeling methods based on code blocks (control flow graphs) or tokens, CSLS uses line semantics as the minimum structural unit to learn nonlinear structural relationships, thereby improving the accuracy of code vulnerability <a class="link-external link-http" href="http://detection.We" rel="external noopener nofollow">this http URL</a> conducted extensive experiments on vulnerability detection datasets from real projects. The CSLS model outperforms the state-of-the-art baselines in code vulnerability detection, achieving 70.57% accuracy on the Devign dataset and a 49.59% F1 score on the Reveal dataset.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem that code structure information is ignored in existing code vulnerability detection methods. Specifically, traditional vulnerability detection methods based on pre - trained models usually regard code as a natural - language sequence and ignore the inherent structure and syntax characteristics of programming languages. This processing method limits the model's ability to understand code semantic and structure information, thus affecting the accuracy of vulnerability detection. To solve this problem, the author proposes a new framework - **Code Structure - Aware Network through Line - level Semantic Learning (CSLS)**. This framework enhances code structure awareness through the following four main components: 1. **Code Preprocessing**: - Convert the code into two text forms: global code text and line - level code text. - Retain structural elements (such as line breaks and indentation characters) during the preprocessing process to enhance the model's understanding of code lines. 2. **Global Semantic Awareness**: - Use a pre - trained model to process the global code text and capture global semantic and structure information. 3. **Line Semantic Awareness**: - Use a pre - trained model to process the line - level code text and capture the semantics of each line of code. 4. **Line Semantic Structure Awareness**: - Use the Transformer module to model the line - level semantic structure in code fragments and learn non - linear structure relationships. Through these improvements, the CSLS model has achieved performance significantly better than existing baseline models on vulnerability detection datasets of multiple real - world projects. In particular, it has achieved an accuracy of 70.57% on the Devign dataset and an F1 - score of 49.59% on the Reveal dataset. Experimental results show that retaining and using code structure information is crucial for improving the performance of code vulnerability detection models. ### Summary The core problem of this paper is that existing vulnerability detection methods fail to fully utilize the structure information of code, resulting in insufficient understanding of code semantics and structure by the model. To this end, the author proposes the CSLS framework, which significantly improves the accuracy and reliability of code vulnerability detection through multi - level semantic and structure awareness.

Line-level Semantic Structure Learning for Code Vulnerability Detection

Function-Level Vulnerability Detection Through Fusing Multi-Modal Knowledge

SCALE: Constructing Structured Natural Language Comment Trees for Software Vulnerability Detection

Vul-LMGNNs: Fusing Language Models and Online-Distilled Graph Neural Networks for Code Vulnerability Detection

Vulnerability Detection for Source Code Using Contextual LSTM

A Hierarchical Deep Neural Network for Detecting Lines of Codes with Vulnerabilities

Vulnerability Detection by Learning from Syntax-Based Execution Paths of Code

Graph Confident Learning for Software Vulnerability Detection

CSGVD: A deep learning approach combining sequence and graph embedding for source code vulnerability detection

Representation vs. Model: What Matters Most for Source Code Vulnerability Detection

Survey of Source Code Vulnerability Analysis Based on Deep Learning

DeepVulSeeker: A novel vulnerability identification framework via code graph structure and pre-training mechanism

LineVD: Statement-level Vulnerability Detection using Graph Neural Networks

SAFE: Advancing Large Language Models in Leveraging Semantic and Syntactic Relationships for Software Vulnerability Detection

Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs

Detecting code vulnerabilities by learning from large-scale open source repositories

MSGVUL: Multi-semantic integration vulnerability detection based on relational graph convolutional neural networks

Automated Code-centric Software Vulnerability Assessment: How Far Are We? An Empirical Study in C/C++

Software Vulnerability Mining and Analysis Based on Deep Learning

Do Language Models Learn Semantics of Code? A Case Study in Vulnerability Detection

GRACE: Empowering LLM-based software vulnerability detection with graph structure and in-context learning