Enabling clone detection for ethereum via smart contract birthmarks

Han Liu,Zhiqiang Yang,Yu Jiang,Wenqi Zhao,Jiaguang Sun
DOI: https://doi.org/10.1109/ICPC.2019.00024
2019-01-01
Abstract:The Ethereum ecosystem has introduced a pervasive blockchain platform with programmable transactions. Everyone is allowed to develop and deploy smart contracts. Such flexibility can lead to a large collection of similar contracts, i.e., clones, especially when Ethereum applications are highly domain-specific and may share similar functionalities within the same domain, e.g., token contracts often provide interfaces for money transfer and balance inquiry. While smart contract clones have a wide range of impact across different applications, e.g., security, they are relatively little studied. Although clone detection has been a long-standing research topic, blockchain smart contracts introduce new challenges, e.g., syntactic diversity due to trade-off between storage and execution, understanding high-level business logic etc.. In this paper, we highlighted the very first attempt to clone detection of Ethereum smart contracts. To overcome the new challenges, we introduce the concept of smart contract birthmark, i.e., a semantic-preserving and computable representation for smart contract bytecode. The birthmark captures high-level semantics by effectively sketching symbolic execution traces (e.g., data access dependencies, path conditions) and maintain syntactic regularities (e.g., type and number of instructions) as well. Then, the clone detection problem is reduced to a computation of statistical similarity between two contract birthmarks. We have implemented a clone detector called EClone and evaluated it on Ethereum. The empirical results demonstrated the potential of EClone in accurately identifying clones. We have also extended EClone for vulnerability search and managed to detect CVE-2018-10376 instances.
What problem does this paper attempt to address?