Abstract:While smart contracts are foundational elements of blockchain applications, their inherent susceptibility to security vulnerabilities poses a significant challenge. Existing training datasets employed for vulnerability detection tools may be limited, potentially compromising their efficacy. This paper presents a method for improving the quantity and quality of smart contract vulnerability datasets and evaluates current detection methods. The approach centers around semantic-preserving code transformation, a technique that modifies the source code structure without altering its semantic meaning. The transformed code snippets are inserted into all potential locations within benign smart contract code, creating new vulnerable contract versions. This method aims to generate a wider variety of vulnerable codes, including those that can bypass detection by current analysis tools. The paper experiments evaluate the method's effectiveness using tools like Slither, Mythril, and CrossFuzz, focusing on metrics like the number of generated vulnerable samples and the false negative rate in detecting these vulnerabilities. The improved results show that many newly created vulnerabilities can bypass tools and the false reporting rate goes up to 100% and increases dataset size minimum by 2.5X.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve two main problems in smart contract vulnerability detection: 1. **Limited quality and quantity of existing data sets**: The existing data sets used to train vulnerability detection tools may be insufficient, resulting in limited effectiveness of these tools. In particular, high - quality manually - verified data sets usually contain less than 1,000 vulnerability source - code samples, while larger - scale data sets rely on the verification of static analysis tools, which may not be reliable enough. 2. **Sensitivity of existing detection tools to code variation**: Since smart contract code can achieve the same semantic function (i.e., semantic equivalence) through multiple structures, traditional analysis tools may not be able to effectively identify vulnerabilities when faced with these code variations. Therefore, a method is needed to generate more diverse vulnerability codes to evaluate and improve the detection capabilities of existing tools. ### Specific objectives of the paper - **RQ1: How do code transformations affect vulnerability detection tools?** - By evaluating the quantity and quality of vulnerability examples in the newly generated data set, study whether code transformations can significantly increase the number of vulnerabilities that existing tools fail to detect (i.e., increase the false - negative rate). - **RQ2: Is the method of injecting vulnerabilities at all potential locations more effective than at only one location?** - Compare the effects of injecting vulnerabilities at a single location and at multiple potential locations, and evaluate which method can better reveal the limitations of existing tools. ### Solutions The paper proposes a method based on semantic - preserving code transformation. The specific steps are as follows: 1. **Code transformation**: Convert the source code containing vulnerabilities into an intermediate representation form (such as an abstract syntax tree AST), and then apply a series of transformation operations (such as variable renaming, function renaming, expression substitution, conditional branch swapping, etc.), ensuring that the transformed code still maintains the original semantics. 2. **Vulnerability injection**: Insert the transformed vulnerability code fragments into all potential locations of the smart contract source code to generate new contract versions containing vulnerabilities. 3. **Experimental evaluation**: Use multiple vulnerability detection tools (such as Slither, Mythril, CrossFuzz) to evaluate the newly generated data set, focusing on indicators such as the number of generated vulnerabilities and the false - negative rate. Through this method, the paper not only increases the quantity of the vulnerability data set but also improves its quality, enabling it to better test and improve the existing vulnerability detection tools.

Impact of Code Transformation on Detection of Smart Contract Vulnerabilities

Smart Contract Vulnerability Detection Technique: A Survey

Cross-Modality Mutual Learning for Enhancing Smart Contract Vulnerability Detection on Bytecode

SCGformer: Smart contract vulnerability detection based on control flow graph and transformer

Optimizing smart contract vulnerability detection via multi-modality code and entropy embedding

A New Smart Contract Anomaly Detection Method by Fusing Opcode and Source Code Features for Blockchain Services

Smart Contract Vulnerability Detection based on Static Analysis and Multi-Objective Search

VDDL: A Deep Learning-Based Vulnerability Detection Model for Smart Contracts.

ConvMHSA-SCVD: Enhancing Smart Contract Vulnerability Detection Through a Knowledge-Driven and Data-Driven Framework

Robust Vulnerability Detection in Solidity-Based Ethereum Smart Contracts Using Fine-Tuned Transformer Encoder Models

CodeNet: Code-Targeted Convolutional Neural Network Architecture for Smart Contract Vulnerability Detection

Vulnerability Scanners for Ethereum Smart Contracts: A Large-Scale Study

An Integrated Smart Contract Vulnerability Detection Tool Using Multi-Layer Perceptron on Real-Time Solidity Smart Contracts

An Efficient Smart Contract Vulnerability Detector Based on Semantic Contract Graphs Using Approximate Graph Matching

Particle Swarm Algorithm for Smart Contract Vulnerability Detection Based on Semantic Web

Analyzing the Impact of Copying-and-Pasting Vulnerable Solidity Code Snippets from Question-and-Answer Websites

Research on Security Vulnerability Detection of Smart Contract

A Smart Contract Vulnerability Detection Model Based on Syntactic and Semantic Fusion Learning

Pioneering automated vulnerability detection for smart contracts in blockchain using KEVM: Guardian ADRGAN

CrossInspector: A Static Analysis Approach for Cross-Contract Vulnerability Detection

Metamorphic Testing for Smart Contract Vulnerabilities Detection