Identifying Functions in Binary Code with Reverse Extended Control Flow Graphs

Jing Qiu,Xiaohong Su,Peijun Ma
DOI: https://doi.org/10.1002/smr.1733
2015-01-01
Journal of Software Evolution and Process
Abstract:In binary code analysis, current function identification approaches are challenged by functions without explicit call sites and handcrafted assembly without standard prologues/epilogues. We propose a new function representation called a reverse extended control flow graph (RECFG) and a RECFG‐based method for identifying functions in stripped binary code. A function has at least one return instruction (an instruction that makes the control flow leave a function). Therefore, return instructions are more reliable than the function prologues and epilogues used by traditional methods. We first build RECFGs from any values that can be interpreted as return instructions in a code range. Then, for each independent RECFG, the multiple‐decision method chooses a subgraph as the control flow graph of a function. A prototype tool is developed for evaluation on seven open source applications, 138 binaries in MASM32 code examples, and 292 binaries in Windows XP SP3. Experimental results show that the proposed method can identify functions that cannot be identified by current methods with high precision and stable recall. Copyright © 2015 John Wiley & Sons, Ltd.
What problem does this paper attempt to address?