Bytecode Similarity Detection of Smart Contract across Optimization Options and Compiler Versions Based on Triplet Network
Di Zhu,Feng Yue,Jianmin Pang,Xin Zhou,Wenjie Han,Fudong Liu
DOI: https://doi.org/10.3390/electronics11040597
IF: 2.9
2022-02-15
Electronics
Abstract:In recent years, the number of smart contracts running in the blockchain has increased rapidly, accompanied by many security problems, such as vulnerability propagation caused by code reuse or vicious transaction caused by malicious contract deployment, for example. Most smart contracts do not publish the source code, but only the bytecode. Based on the research of bytecode similarity of smart contract, smart contract upgrade, vulnerability search and malicious contract analysis can be carried out. The difficulty of bytecode similarity research is that different compilation versions and optimization options lead to the diversification of bytecode of the same source code. This paper presents a solution, including a series of methods to measure the similarity of smart contract bytecode. Starting from the opcode of smart contract, a method of pre-training the basic block sequence of smart contract is proposed, which can embed the basic block vector. Positive samples were obtained by basic block marking, and the negative sampling method is improved. After these works, we put the obtained positive samples, negative samples and basic blocks themselves into the triplet network composed of transformers. Our solution can obtain evaluation results with an accuracy of 97.8%, so that the basic block sequence of optimized and unoptimized options can be transformed into each other. At the same time, the instructions are normalized, and the order of compiled version instructions is normalized. Experiments show that our solution can effectively reduce the bytecode difference caused by optimization options and compiler version, and improve the accuracy by 1.4% compared with the existing work. We provide a data set covering 64 currently used Solidity compilers, including one million basic block pairs extracted from them.
engineering, electrical & electronic,computer science, information systems,physics, applied