Investigating Neural-based Function Name Reassignment from the Perspective of Binary Code Representation

Guoqiang Chen,Han Gao,Jie Zhang,Yanru He,Shaoyin Cheng,Weiming Zhang
DOI: https://doi.org/10.1109/PST58708.2023.10320193
2023-01-01
Abstract:Building a model to reassign descriptive names for binary functions is considerable assistance for reverse engineering. Existing methods proposed for this issue are based on the low-level representation of binary code (e.g., assembly code), and especially the recent approaches employed neural-based models on instruction sequences. However, their performance is still unsatisfactory. Meanwhile, modern decompilers provide lifted representations of binary code, and their effectiveness has not been adequately studied. This paper further explores the issue of function name reassignment from the perspective of binary code representation. Specifically, we present a general and flexible NEural-based function name Reassignment framework NER, which leverages a decompiler to obtain a specific representation and applies the corresponding serialization strategy on it. NER then uses an alternative neural network to make predictions. Three levels of representation are investigated, including assembly code, Intermediate Representation (IR), and pseudo-code. We observe the binary code representations are significant for the final performance. It demonstrates that the pseudo-code is the most effective one. Based on these findings, we leverage the framework to implement a reassignment model NER-pc, which has 25% and 10% F1 score improvements against the state-of-the-art methods. Besides, more experiments are conducted to verify the design of NER and the effectiveness of NER-pc.
What problem does this paper attempt to address?