Bytecode-based Approach for Ethereum Smart Contract Classification

LIN Dan,LIN Kaixin,WU Jiajing,ZHENG Zibin
DOI: https://doi.org/10.11959/j.issn.2096−109x.2022046
2022-01-01
Abstract:In recent years, blockchain technology has been widely used and concerned in many fields, including finance, medical care and government affairs.However, due to the immutability of smart contracts and the particularity of the operating environment, various security issues occur frequently.On the one hand, the code security problems of contract developers when writing contracts, on the other hand, there are many high-risk smart contracts in Ethereum, and ordinary users are easily attracted by the high returns provided by high-risk contracts, but they have no way to know the risks of the contracts.However, the research on smart contract security mainly focuses on code security, and there is relatively little research on the identification of contract functions.If the smart contract function can be accurately classified, it will help people better understand the behavior of smart contracts, while ensuring the ecological security of smart contracts and reducing or recovering user losses.Existing smart contract classification methods often rely on the analysis of the source code of smart contracts, but contracts released on Ethereum only mandate the deployment of bytecode, and only a very small number of contracts publish their source code.Therefore, an Ethereum smart contract classification method based on bytecode was proposed.Collect the Ethereum smart contract bytecode and the corresponding category label, and then extract the opcode frequency characteristics and control flow graph characteristics.The characteristic importance is analyzed experimentally to obtain the appropriate graph vector dimension and optimal classification model, and finally the multi-classification task of smart contract in five categories of exchange, finance, gambling, game and high risk is experimentally verified, and the F1 score of the XGBoost classifier reaches 0.913 8.Experimental results show that the algorithm can better complete the classification task of Ethereum smart contracts, and can be applied to the prediction of smart contract categories in reality.
What problem does this paper attempt to address?