Static Application Security Testing (SAST) Tools for Smart Contracts: How Far Are We?

Kaixuan Li,Yue Xue,Sen Chen,Han Liu,Kairan Sun,Ming Hu,Haijun Wang,Yang Liu,Yixiang Chen
DOI: https://doi.org/10.1145/3660772
2024-06-29
Abstract:In recent years, the importance of smart contract security has been heightened by the increasing number of attacks against them. To address this issue, a multitude of static application security testing (SAST) tools have been proposed for detecting vulnerabilities in smart contracts. However, objectively comparing these tools to determine their effectiveness remains challenging. Existing studies often fall short due to the taxonomies and benchmarks only covering a coarse and potentially outdated set of vulnerability types, which leads to evaluations that are not entirely comprehensive and may display bias. In this paper, we fill this gap by proposing an up-to-date and fine-grained taxonomy that includes 45 unique vulnerability types for smart contracts. Taking it as a baseline, we develop an extensive benchmark that covers 40 distinct types and includes a diverse range of code characteristics, vulnerability patterns, and application scenarios. Based on them, we evaluated 8 SAST tools using this benchmark, which comprises 788 smart contract files and 10,394 vulnerabilities. Our results reveal that the existing SAST tools fail to detect around 50% of vulnerabilities in our benchmark and suffer from high false positives, with precision not surpassing 10%. We also discover that by combining the results of multiple tools, the false negative rate can be reduced effectively, at the expense of flagging 36.77 percentage points more functions. Nevertheless, many vulnerabilities, especially those beyond Access Control and Reentrancy vulnerabilities, remain undetected. We finally highlight the valuable insights from our study, hoping to provide guidance on tool development, enhancement, evaluation, and selection for developers, researchers, and practitioners.
Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to objectively evaluate the effectiveness and limitations of existing Static Application Security Testing (SAST) tools in smart contract vulnerability detection. Specifically, the paper aims to: 1. **Construct a latest, fine - grained smart contract vulnerability classification system**: Existing classification systems usually cover relatively rough and outdated vulnerability types and cannot comprehensively cover all currently known vulnerability types. To this end, the author proposes a latest classification system containing 45 unique vulnerability types. 2. **Develop a comprehensive and diverse benchmark test set**: Existing benchmark test sets are either small in size or cover a limited number of vulnerability types, resulting in potentially biased evaluation results. The author constructs a benchmark test set containing 788 smart contract files and 10,394 vulnerabilities by collecting existing high - quality data sets and manually annotating a large number of smart contract projects. 3. **Evaluate the performance of multiple SAST tools**: Based on the newly proposed classification system and benchmark test set, the author selects 8 representative SAST tools for evaluation. The evaluation content includes: - **Coverage analysis**: The detection coverage of each tool for different vulnerability types. - **Effectiveness analysis**: Evaluate the effectiveness of tools through indicators such as recall, precision, and F1 - score. - **Consistency analysis**: Evaluate the effect of using multiple tools in combination. - **Efficiency analysis**: Evaluate the time cost and resource consumption of tools when processing large - scale smart contracts. ### Main Findings - **In terms of coverage**: CSA and Securify2 show higher vulnerability coverage. - **In terms of effectiveness**: CSA performs best in recall and precision, while Slither's performance declines due to a high false positive rate. However, existing tools still fail to detect approximately 50% of vulnerabilities, and the precision does not exceed 10%. - **In terms of consistency**: By combining the results of multiple tools, the false negative rate can be effectively reduced to 29.3%, but at the cost of marking an additional 36.77 percentage points of functions. - **In terms of efficiency**: Tools using symbolic execution techniques (such as Manticore) take a long time, while static analysis tools (such as SmartCheck) are faster. ### Conclusions and Contributions The main contributions of this paper include: 1. Proposing a new smart contract vulnerability classification system covering 45 unique vulnerability types. 2. Constructing a benchmark test set containing 788 smart contract files and 10,394 vulnerabilities, which is currently the largest smart contract vulnerability benchmark test set. 3. Conducting a large - scale evaluation of 8 SAST tools, providing in - depth insights into these tools in terms of coverage, effectiveness, consistency, and efficiency. Through these efforts, the author hopes to provide guidance on tool development, enhancement, evaluation, and selection for developers, researchers, and practitioners, thereby promoting the further development of the smart contract security field.