Improving Smart Contract Security with Contrastive Learning-based Vulnerability Detection

Yizhou Chen,Zeyu Sun,Zhihao Gong,Dan Hao
2024-04-27
Abstract:Currently, smart contract vulnerabilities (SCVs) have emerged as a major factor threatening the transaction security of blockchain. Existing state-of-the-art methods rely on deep learning to mitigate this threat. They treat each input contract as an independent entity and feed it into a deep learning model to learn vulnerability patterns by fitting vulnerability labels. It is a pity that they disregard the correlation between contracts, failing to consider the commonalities between contracts of the same type and the differences among contracts of different types. As a result, the performance of these methods falls short of the desired level.
Cryptography and Security,Software Engineering
What problem does this paper attempt to address?
The paper aims to address the issue of Smart Contract Vulnerabilities (SCVs) detection. As blockchain systems and their smart contracts are increasingly applied in personal and commercial scenarios, these systems have also become targets for cybercriminals who attempt to exploit software vulnerabilities for illegal profit. The existence of smart contract vulnerabilities poses a significant threat to the security of transactions on the blockchain, potentially leading to malicious attackers exploiting virtual assets and causing substantial financial losses to users. For example, in 2016, the Decentralized Autonomous Organization on Ethereum was attacked, resulting in approximately $50 million worth of Ether being stolen; in 2018, the decentralized exchange Bancor lost about $23.5 million in cryptocurrency due to smart contract vulnerabilities. Currently, smart contract vulnerability detection methods are mainly divided into two categories: rule-based methods and deep learning-based methods. Rule-based methods rely on predefined rules or manually defined patterns of smart contract code and execution to identify SCVs. However, these methods have shortcomings, such as the difficulty in covering all possible types of vulnerabilities with predefined patterns, and the development of these patterns is time-consuming and error-prone. On the other hand, deep learning-based methods learn vulnerability patterns through neural networks, achieving better performance improvements. However, they typically treat each input contract as an independent entity, without considering the correlations between contracts, including the commonalities among contracts of the same type and the differences between contracts of different types. Therefore, the performance of these methods has not reached the expected level. To address the above issues, the paper proposes a new contrastive learning-enhanced automatic identification method called Clear. Clear introduces the correlation between smart contracts and utilizes a contrastive learning model (CL model) to learn pairwise comparisons between smart contracts to identify their correlations. Additionally, by reusing existing vulnerability labels to generate correlation labels to guide the training process of the CL model, the performance of smart contract vulnerability detection (SCVD) is improved. Experimental results show that Clear has significant advantages over 13 state-of-the-art SCVD methods on a large-scale real dataset (over 40,000 smart contracts), performing excellently on multiple evaluation metrics, particularly improving the F1 score by 9.73%-39.99% compared to the best existing methods. Furthermore, Clear demonstrates the ability to cluster vulnerable contracts and separate them from non-vulnerable contracts in the feature space, and proves that the proposed CL model can enhance RNN-based models (such as RNN, LSTM, GRU), improving their performance by 40.51%-50.94%.