Ethereum Smart Contract Representation Learning for Robust Bytecode-Level Similarity Detection

Zhenzhou Tian,Yaqian Huang,Jie Tian,Zhongmin Wang,Yanping Chen,Lingwei Chen
DOI: https://doi.org/10.18293/seke2022-040
2022-01-01
Abstract:Smart contracts are programs that run on a blockchain, where Ethereum is one of the most popular ones supporting them.Due to the fact that they are immutable, it is essential to design smart contracts bug-free before they are deployed.However, various defects have been found in the deployed smart contracts, causing huge economic losses and lowing people's trust.Writing secure smart contracts is far from trivial, where developers tend to engage in reliable resources or social coding platforms to reuse code.This leads to a large number of similar contracts with potential security risks.Therefore, detecting similarity of smart contracts helps to avoid vulnerabilities, identify threats, and improve the security of Ethereum.In this paper, we design a learning-effective and costefficient model, called SmartSD, for Ethereum smart contract similarity detection.Different from the current research efforts, SmartSD is performed on a bytecode level and leverages deep neural networks to learn the latent representations from the opcode sequences for smart contract bytecodes, where the representation learning and similarity measurement are supervised via siamese neural networks.The experimental evaluations demonstrate that SmartSD outperforms EClone's 93.27% accuracy, achieving 98.37% high detection accuracy and 0.9850 F1-score, which is computationally tractable and effectively mitigates the interference caused by compilers.
What problem does this paper attempt to address?