NeuReduce: Reducing Mixed Boolean-Arithmetic Expressions by Recurrent Neural Network

Weijie Feng,Binbin Liu,Dongpeng Xu,Qilong Zheng,Yun Xu
DOI: https://doi.org/10.18653/v1/2020.findings-emnlp.56
2020-01-01
Abstract:Mixed Boolean-Arithmetic (MBA) expressions involve both arithmetic calculation (e.g., plus, minus, multiply) and bitwise computation (e.g., and, or, negate, xor). MBA expressions have been widely applied in software obfuscation, transforming programs from a simple form to a complex form. MBA expressions are challenging to be simplified, because the interleaving bitwise and arithmetic operations causing mathematical reduction laws to be ineffective. Our goal is to recover the original, simple form from an obfuscated MBA expression. In this paper, we first propose NeuReduce, a string to string method based on neural networks to automatically learn and reduce complex MBA expressions. We develop a comprehensive MBA dataset, including one million diversified MBA expression samples and corresponding simplified forms. After training on the dataset, NeuReduce can reduce complex MBA expressions to mathematically equivalent but concise forms. By comparing with three state-of-the-art MBA reduction methods, our evaluation result shows that NeuReduce outperforms all other tools in terms of accuracy, solving time, and performance overhead.
What problem does this paper attempt to address?