Input-Output Example-Guided Data Deobfuscation on Binary

Yujie Zhao,Zhanyong Tang,Guixin Ye,Xiaoqing Gong,Dingyi Fang
DOI: https://doi.org/10.1155/2021/4646048
IF: 1.968
2021-01-01
Security and Communication Networks
Abstract:Data obfuscation is usually used by malicious software to avoid detection and reverse analysis. When analyzing the malware, such obfuscations have to be removed to restore the program into an easier understandable form (deobfuscation). The deobfuscation based on program synthesis provides a good solution for treating the target program as a black box. Thus, deobfuscation becomes a problem of finding the shortest instruction sequence to synthesize a program with the same input-output behavior as the target program. Existing work has two limitations: assuming that obfuscated code snippets in the target program are known and using a stochastic search algorithm resulting in low efficiency. In this paper, we propose fine-grained obfuscation detection for locating obfuscated code snippets by machine learning. Besides, we also combine the program synthesis and a heuristic search algorithm of Nested Monte Carlo Search. We have applied a prototype implementation of our ideas to data obfuscation in different tools, including OLLVM and Tigress. Our experimental results suggest that this approach is highly effective in locating and deobfuscating the binaries with data obfuscation, with an accuracy of at least 90.34%. Compared with the state-of-the-art deobfuscation technique, our approach’s efficiency has increased by 75%, with the success rate increasing by 5%.
What problem does this paper attempt to address?