The Incredible Shrinking Context... in a decompiler near you

Sifis Lagouvardos,Yannis Bollanos,Neville Grech,Yannis Smaragdakis
2024-09-17
Abstract:Decompilation of binary code has arisen as a highly-important application in the space of Ethereum VM (EVM) smart contracts. Major new decompilers appear nearly every year and attain popularity, for a multitude of reverse-engineering or tool-building purposes. Technically, the problem is fundamental: it consists of recovering high-level control flow from a highly-optimized continuation-passing-style (CPS) representation. Architecturally, decompilers can be built using either static analysis or symbolic execution techniques. We present Shrknr, a static-analysis-based decompiler succeeding the state-of-the-art Elipmoc decompiler. Shrknr manages to achieve drastic improvements relative to the state of the art, in all significant dimensions: scalability, completeness, precision. Chief among the techniques employed is a new variant of static analysis context: shrinking context sensitivity. Shrinking context sensitivity performs deep cuts in the static analysis context, eagerly "forgetting" control-flow history, in order to leave room for further precise reasoning. We compare Shrnkr to state-of-the-art decompilers, both static-analysis- and symbolic-execution-based. In a standard benchmark set, Shrnkr scales to over 99.5% of contracts (compared to ~95%), covers (i.e., reaches and manages to decompile) 67% more code, and reduces key imprecision metrics by over 65%.
Programming Languages
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the decompilation of EVM (Ethereum Virtual Machine) smart contracts. Specifically, it aims to recover high - level control flow and program structure from highly optimized, low - level binary code represented in Continuation - Passing Style (CPS). ### Problem Background 1. **Importance of Decompilation**: - Smart contracts are small programs on blockchain platforms such as Ethereum, usually written in high - level languages like Solidity and deployed on the blockchain. - Decompiling these smart contracts is very important for various application scenarios, such as automated analysis, security vulnerability detection, understanding competitive trading strategies, etc. 2. **Technical Challenges**: - EVM bytecode is very low - level, and all control flows are implemented through jump instructions, making it difficult to recover high - level control flow structures. - The compiler merges and optimizes basic blocks, so that the same instruction sequence appears only once, increasing the difficulty of decompilation. - The calculation of the Control Flow Graph (CFG) is complex, especially when dealing with dynamic jumps. ### Core Contributions of the Paper The paper introduces Shrnkr, a new decompilation tool based on static analysis, which significantly improves the performance of existing decompilers in terms of scalability, completeness, and accuracy. Its main innovations include: 1. **Shrinking Context Sensitivity**: - This is a new static analysis context abstraction. It reduces the context depth by "forgetting" the control flow history, thus leaving room for more accurate reasoning. - When each function call returns, it discards the context elements related to that call and keeps important call information. 2. **Block Cloning**: - Performing block cloning transformation before global analysis helps to normalize the decompilation output and improve accuracy. 3. **Pre - analysis - guided Elimination of Spurious Calls**: - Prepare the main analysis through pre - analysis to reduce inaccurate results. ### Experimental Results Experiments show that Shrnkr has significant improvements over existing decompilation tools (such as Elipmoc and Heimdall - rs) in several aspects: - **Scalability**: Shrnkr can handle more than 99.5% of contracts, while Elipmoc can only handle about 95%. - **Completeness**: The amount of code covered by Shrnkr is 67% more than that of other tools. - **Accuracy**: Shrnkr reduces the key inaccuracy index by more than 65%. ### Summary By introducing new technologies such as shrinking context sensitivity, the paper significantly improves the quality of EVM smart contract decompilation and solves the bottlenecks encountered by existing tools when dealing with complex contracts. This not only improves the efficiency and accuracy of decompilation but also provides a solid foundation for further research and applications. If you need more detailed formulas or technical details, please let me know the specific chapters or parts, and I will provide you with a more in - depth explanation.