Optimizing smart contract vulnerability detection via multi-modality code and entropy embedding
Dawei Yuan,Xiaohui Wang,Yao Li,Tao Zhang
DOI: https://doi.org/10.1016/j.jss.2023.111699
IF: 3.5
2023-04-08
Journal of Systems and Software
Abstract:Smart contracts have been widely used in the blockchain world these years, and simultaneously vulnerability detection has gained more and more attention due to the staggering economic losses caused by the attacker. Existing tools that analyze vulnerabilities for smart contracts heavily rely on rules predefined by experts, which are labour-intense and require domain knowledge. Moreover, predefined rules tend to be misconceptions and increase the risk of crafty potential back-doors in the future. Recently, researchers mainly used static and dynamic execution analysis to detect the vulnerabilities of smart contracts and have achieved acceptable results. However, the dynamic method cannot cover all the program inputs and execution paths, which leads to some vulnerabilities that are hard to detect. The static analysis method commonly includes symbolic execution and theorem proving, which requires using constraints to detect vulnerability. These shortcomings show that traditional methods are challenging to apply and expand on a large scale. This paper aims to detect vulnerabilities via the Bug Injection framework and transfer learning techniques. First, we train a Transformer encoder using multi-modality code, which contains source code, intermediate representation, and assembly code. The input code consists separately of Solidity source code, intermediate representation, and assembly code. Specifically, we translate source code into the intermediate representation and decompile the byte code into assembly code by the EVM compiler. Then, we propose a novel entropy embedding technique, which combines token embedding, segment embedding, and positional embedding of the Transformer encoder in our approach. After that, we utilize the Bug Injection framework to automatically generate specific types of buggy code for fine-tuning and evaluating the performance of vulnerability detection. The experimental results show that our proposed approach improves the performance in detecting reentrancy vulnerabilities and timestamp dependence. Moreover, our approach is more flexible and scalable than static and dynamic analysis approaches in detecting smart contract vulnerabilities. Our approach improves the baseline approaches by an average of 11.89% in term of F1 score.
computer science, theory & methods, software engineering