COBRA: Interaction-Aware Bytecode-Level Vulnerability Detector for Smart Contracts

Wenkai Li,Xiaoqi Li,Zongwei Li,Yuqing Zhang
2024-10-28
Abstract:The detection of vulnerabilities in smart contracts remains a significant challenge. While numerous tools are available for analyzing smart contracts in source code, only about 1.79% of smart contracts on Ethereum are open-source. For existing tools that target bytecodes, most of them only consider the semantic logic context and disregard function interface information in the bytecodes. In this paper, we propose COBRA, a novel framework that integrates semantic context and function interfaces to detect vulnerabilities in bytecodes of the smart contract. To our best knowledge, COBRA is the first framework that combines these two features. Moreover, to infer the function signatures that are not present in signature databases, we present SRIF (Signatures Reverse Inference from Functions), automatically learn the rules of function signatures from the smart contract bytecodes. The bytecodes associated with the function signatures are collected by constructing a control flow graph (CFG) for the SRIF training. We optimize the semantic context using the operation code in the static single assignment (SSA) format. Finally, we integrate the context and function interface representations in the latent space as the contract feature embedding. The contract features in the hidden space are decoded for vulnerability classifications with a decoder and attention module. Experimental results demonstrate that SRIF can achieve 94.76% F1-score for function signature inference. Furthermore, when the ground truth ABI exists, COBRA achieves 93.45% F1-score for vulnerability classification. In the absence of ABI, the inferred function feature fills the encoder, and the system accomplishes an 89.46% recall rate.
Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to detect vulnerabilities in smart contracts. Although there are currently multiple tools that can analyze smart contracts in the source code, only about 1.79% of Ethereum smart contracts are open - source. For existing tools targeting bytecode, most of them only consider the semantic - logic context and ignore the function - interface information in the bytecode. Therefore, the paper proposes a new framework named COBRA, which combines the semantic context and function interfaces to detect vulnerabilities in smart - contract bytecode. This is the first framework to combine these two features. ### Main Contributions of the Paper 1. **SRIF (Signature Reverse - Inference from Functions)**: Utilize the seq2seq structure to extract function input parameters from the semantic context. Specifically, infer function properties by counting specific opcodes and jointly mapping them into function features. 2. **COBRA Framework**: Propose for the first time the COBRA framework that integrates the semantic context and ABI features into a novel encoder. COBRA generates an embedded representation of the smart contract for the decoder to perform vulnerability classification. 3. **Experimental Results**: When the original ABI is available, the F1 score of SRIF in function - signature inference exceeds 94%; even without ABI, a recall rate of over 89% can be achieved using the inferred function representation. 4. **Open - Sourcing of Datasets and Code**: The relevant datasets and code have been open - sourced and can be obtained at the specified link. ### Framework Overview The main steps of the COBRA framework are as follows: 1. **Context Extraction**: Extract opcodes in the original and SSA formats from the bytecode smart contract. 2. **ABI Acquisition**: Crawl the original ABI data from Etherscan based on the contract address. 3. **Signature Inference**: If ABI information is missing, use SRIF to infer function signatures and properties. 4. **Property Summarization**: Infer properties of functions, such as state mutability and payment properties. 5. **Vulnerability Detection**: Combine the processed semantic context and function representation, and use the decoder and attention module to generate a vulnerability report. ### Key Technologies - **Context Extraction**: Extract opcodes and function IDs by decompiling bytecode and constructing a control - flow graph (CFG). - **Signature Inference**: Use the SRIF framework, combined with opcodes and function context, to infer function signatures. - **Property Summarization**: Infer the state mutability and payment properties of functions for application in vulnerability detection. - **Vulnerability Detection**: Use a decoder module with an attention mechanism to classify vulnerabilities in the bytecode of smart contracts. ### Experimental Results - **Function - Signature Inference**: The F1 score of SRIF in function - signature inference reaches 94.76%. - **Vulnerability Classification**: When the real ABI is available, the F1 score of COBRA in vulnerability classification reaches 93.45%; even without ABI, a recall rate of 89.46% can be achieved using the inferred function features. ### Conclusion The COBRA framework effectively improves the accuracy of vulnerability detection in smart - contract bytecode by combining semantic - context and function - interface information. This method not only fills the gaps in existing tools but also provides a new solution for the security of smart contracts.