Hunting Vulnerable Smart Contracts via Graph Embedding Based Bytecode Matching

Jianjun Huang,Songming Han,Wei You,Wenchang Shi,Bin Liang,Jingzheng Wu,Yanjun Wu
DOI: https://doi.org/10.1109/TIFS.2021.3050051
IF: 7.231
2021-01-01
IEEE Transactions on Information Forensics and Security
Abstract:Smart contract vulnerabilities have attracted lots of concerns due to the resultant financial losses. Matching-based detection methods extrapolating known vulnerabilities to unknown have proven to be effective in other platforms. However, directly adopting the technique to smart contracts is obstructed by two issues, i.e., diversity of bytecode generation resulting from the rapid evolution of compilers and interference of noise code easily caused by the homogeneous business logics. To address the problems, we propose contract bytecode-oriented normalization and slicing techniques to augment bytecode matching. Specifically, we conduct data- and instruction-level normalizations to uniform the bytecode generated by different compilers, and enforce contract-specific slicing by tracking data- and control-flows with simulated bytecode executions to prune the noise code as far as possible. Based on the above techniques, we design an unsupervised graph embedding algorithm to encode the code graphs into quantitatively comparable vectors. The potentially vulnerable smart contracts can be identified by measuring the similarities between their vectors and known vulnerable ones. Our evaluations have shown the efficiency (0.47 seconds per contract on average), effectiveness (160 verified true positives) and high precision (91.95% for top-ranked). It is worth noting that, we also identify dozens of honeypot contracts, further demonstrating the capability of our method.
What problem does this paper attempt to address?