Peculiar: Smart Contract Vulnerability Detection Based on Crucial Data Flow Graph and Pre-training Techniques

Yihao Qin,Bo Lin,Shangwen Wang,Zhuo Zhang,Xiaoguang Mao,Yan Lei,Haoyu Zhang,Hongjun Wu
DOI: https://doi.org/10.1109/ISSRE52982.2021.00047
2021-10-01
Abstract:Smart contracts with natural economic attributes have been widely and rapidly developed in various fields. However, the bugs and vulnerabilities in smart contracts have brought huge economic losses, which has strengthened people's attention to the security issues of smart contracts. The immutability of smart contracts makes people more willing to conduct security checks before deploying smart contracts. Nonetheless, existing smart contract vulnerability detection techniques are far away from enough: static analysis approaches rely heavily on manually crafted heuristics which is difficult to reuse across different types of vulnerabilities while deep learning based approaches also have unique limitations. In this study, we propose a novel approach, Peculiar, which uses Pre-training technique for detection of smart contract vulnerabilities based on crucial data flow graph. Compared against the traditional data flow graph which is already utilized in existing approach, crucial data flow graph is less complex and does not bring an unnecessarily deep hierarchy, which makes the model easy to focus on the critical features. Moreover, we also involve pre-training technique in our model due to the dramatic improvements it has achieved on a variety of NLP tasks. Our empirical results show that Peculiar can achieve 91.80 % precision and 92.40 % recall in detecting reentrancy vulnerability, one of the most severe and common smart contract vulnerabilities, on 40,932 smart contract files, which is significantly better than the state-of-the-art methods (e.g., Smartcheck achieves 79.37% precision and 70.50% recall). Meanwhile, another experiment shows that Peculiar is more discerning to reentrancy vulnerability than existing approaches. The ablation experiment reveals that both crucial data flow graph and pre-trained model contribute significantly to the performances of Peculiar.
Computer Science
What problem does this paper attempt to address?