EFACT: an External Function Auto-Completion Tool to Strengthen Static Binary Lifting

Yilei Zhang,Haoyu Liao,Zekun Wang,Bo Huang,Jianmei Guo
2024-05-15
Abstract:Static binary lifting is essential in binary rewriting frameworks. Existing tools overlook the impact of External Function Completion (EXFC) in static binary lifting. EXFC recovers the prototypes of External Functions (EXFs, functions defined in standard shared libraries) using only the function symbols available. Incorrect EXFC can misinterpret the source binary, or cause memory overflows in static binary translation, which eventually results in program crashes. Notably, existing tools struggle to recover the prototypes of mangled EXFs originating from binaries compiled from C++. Moreover, they require time-consuming manual processing to support new libraries.
Software Engineering
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of External Function Completion (EXFC) in the process of static binary lifting. Specifically, the author points out that the current static binary lifting tools have the following two main problems when dealing with external function declarations: 1. **Incorrect recovery of function parameters and return types**: - Existing tools perform poorly in recovering the declarations of mangled functions in C++ and variable - parameter functions. For example, some tools will wrongly treat all unrecognized cases uniformly as `int` or `i32/i64` types, which may lead to precision errors or memory overflows and ultimately cause the program to crash. - Especially for the template specialization mechanism in C++, existing tools have difficulty accurately recovering these function declarations. 2. **Difficulties in integrating new libraries or migrating to other frameworks**: - Current tools require a great deal of manual processing when supporting new libraries or migrating to other binary lifting frameworks, which is not only complex but also time - consuming. In addition, the EXFC part of existing tools is tightly coupled with other components, and users need to have a deep understanding of the project structure in order to make extensions. - For example, although some tools provide interfaces to manually provide external function declarations, this task is very difficult due to the complexity and large size of external libraries. To solve these problems, the author proposes EFACT (External Function Auto - Completion Tool), an external function auto - completion tool for static binary lifting. EFACT improves existing tools in the following aspects: - **More accurate function declaration recovery**: Especially for mangled functions in C++, EFACT has designed an MFC algorithm to more accurately recover the return types and implicit `this` parameters of these functions and handle the template specialization mechanism. - **Automatic dictionary generator**: EFACT has developed a Dictauto - generator, which can extract complete function declarations from external libraries and catalog them into a dictionary, thereby supplementing the missing information of LLVM's Demangling API and more effectively solving the problems of fixed - parameter functions (FPC), variable - parameter functions (VPC), and mangled functions (MFC). - **Multi - dimensional covered library database**: Based on the Dictauto - generator, EFACT has built a multi - level library database covering multiple dimensions such as ISA, processor architecture, operating system, and library version, ensuring a wide coverage range and using backward compatibility to cover more platforms and library versions. - **Easy to integrate and extend**: EFACT can generate output in LLVM IR and C/C++ program formats just by inputting ELF files, making it an ideal plugin for various binary rewriting frameworks. In addition, EFACT also supports binary files compiled from other programming languages and binary files containing external functions from other libraries. Through these improvements, EFACT significantly improves the accuracy and efficiency in the process of static binary lifting and solves the limitations of existing tools in dealing with external function declarations.