When Deep Learning Meets Smart Contracts

Zhipeng Gao
DOI: https://doi.org/10.1145/3324884.3418918
2020-08-07
Abstract:Ethereum has become a widely used platform to enable secure, Blockchain-based financial and business transactions. However, many identified bugs and vulnerabilities in smart contracts have led to serious financial losses, which raises serious concerns about smart contract security. Thus, there is a significant need to better maintain smart contract code and ensure its high reliability. In this research: (1) Firstly, we propose an automated deep learning based approach to learn structural code embeddings of smart contracts in Solidity, which is useful for clone detection, bug detection and contract validation on smart contracts. We apply our approach to more than 22K solidity contracts collected from the Ethereum blockchain, results show that the clone ratio of solidity code is at around 90%, much higher than traditional software. We collect a list of 52 known buggy smart contracts belonging to 10 kinds of common vulnerabilities as our bug database. Our approach can identify more than 1000 clone related bugs based on our bug databases efficiently and accurately. (2) Secondly, according to developers' feedback, we have implemented the approach in a web-based tool, named SmartEmbed, to facilitate Solidity developers for using our approach. Our tool can assist Solidity developers to efficiently identify repetitive smart contracts in the existing Ethereum blockchain, as well as checking their contract against a known set of bugs, which can help to improve the users' confidence in the reliability of the contract. We optimize the implementations of SmartEmbed which is sufficient in supporting developers in real-time for practical uses. The Ethereum ecosystem as well as the individual Solidity developer can both benefit from our research.
Software Engineering
What problem does this paper attempt to address?
This paper aims to solve the problems of code clone detection and vulnerability detection in Ethereum smart contracts. Specifically: 1. **Code Clone Detection**: - There is a large amount of code repetition in smart contracts, which may lead to serious security threats, such as security attacks and resource waste. The paper proposes a deep - learning - based method to detect code clones by learning the structural code embeddings of smart contracts. The experimental results show that the clone proportion of Solidity code is as high as 90%, much higher than that of traditional software. - This method can effectively identify more semantic clones and is more accurate than the commonly - used clone detection tool Deckard. 2. **Vulnerability Detection**: - The paper collected 52 known smart contracts containing 10 common vulnerabilities as a vulnerability database. Through this method, more than 1,000 clone vulnerabilities related to known vulnerabilities can be efficiently and accurately identified. - This method can easily add new vulnerability check rules by generating new code embeddings without additional manual efforts to define vulnerability specifications. 3. **Practical Application**: - According to the feedback from Solidity developers, the author implemented this method as a Web tool named SmartEmbed to help developers efficiently detect code clones and vulnerabilities and improve their confidence in contract reliability. - In order to meet the efficiency requirements of online Web tools, SmartEmbed has been optimized in three aspects: (1) replacing multiple loop structure calculations with matrix calculations; (2) caching code embeddings to reduce redundant data loading; (3) creating indexes for smart contracts in the database to accelerate the information retrieval process. In conclusion, this research provides an efficient and accurate solution for detecting code clones and vulnerabilities in smart contracts through deep - learning technology, thereby improving the security and reliability of smart contracts.