Machine Learning-Based Analysis of Program Binaries: A Comprehensive Study

Hongfa Xue,Shaowen Sun,Guru Venkataramani,Tian Lan
DOI: https://doi.org/10.1109/access.2019.2917668
IF: 3.9
2019-01-01
IEEE Access
Abstract:Binary code analysis is crucial in various software engineering tasks, such as malware detection, code refactoring, and plagiarism detection. With the rapid growth of software complexity and the increasing number of heterogeneous computing platforms, binary analysis is particularly critical and more important than ever. Traditionally adopted techniques for binary code analysis are facing multiple challenges, such as the need for cross-platform analysis, high scalability and speed, and improved fidelity, to name a few. To meet these challenges, machine learning-based binary code analysis frameworks attract substantial attention due to their automated feature extraction and drastically reduced efforts needed on large-scale programs. In this paper, we provide the taxonomy of machine learning-based binary code analysis, describe the recent advances and key findings on the topic, and discuss the key challenges and opportunities. Finally, we present our thoughts for future directions on this topic.
What problem does this paper attempt to address?