Automatic Identification of Crash-inducing Smart Contracts
Chao Ni,Cong Tian,Kaiwen Yang,David Lo,Jiachi Chen,Xiaohu Yang
DOI: https://doi.org/10.1109/saner56733.2023.00020
2023-01-01
Abstract:Smart contract, a special software code running on and resided in the blockchain, enlarges the general application of blockchain and exchanges assets without dependence of external parties. With blockchain’s characteristic of immutability, they cannot be modified once deployed. Thus, the contract and the records are persisted on the blockchain forever, including failed transactions that are caused by runtime errors and result in the waste of computation, storage, and fees. In this paper, we refer to smart contracts which will cause runtime errors as crash-inducing smart contracts. However, automatic identification of crash-inducing smart contracts is limited investigated in the literature. The existing approaches to identify crash-inducing smart contracts are either limited in finding vulnerability (e.g., pattern-based static analysis) or very expensive (e.g., program analysis), which is insufficient for Ethereum.To reduce runtime errors on Ethereum, we propose an efficient, generalizable, and machine learning-based crash-inducing smart contract detector, CRASHSCDET, to automatically identify crash-inducing smart contracts. To investigate the effectiveness of CRASHSCDET, we firstly propose 34 static source code metrics from four dimensions (i.e., complexity metrics, count metrics, object-oriented metrics, and Solidity-specific metrics) to characterize smart contracts. Then, we collect a large-scale dataset of verified smart contracts (i.e., 54,739) and label these smart contracts based on their execution traces on Etherscan. We make a comprehensive comparison with three state-of-the-art approaches and the results show that CRASHSCDET can achieve good performance (i.e., 0.937 of F1-measure and 0.980 of AUC on average) and statistically significantly improve the baselines by 0.5%-60.4% in terms of F1-measure and by 41.2%-44.3% in terms of AUC, which indicates the effectiveness of static source code metrics in identifying crash-inducing smart contracts. We further investigate the importance of different types of metrics and find that metrics in different dimensions have varying abilities to depict the characteristic of smart contracts. Especially, metrics belonging to the "Count" dimension are the most discriminative ones but combining all metrics can achieve better prediction performance.