Towards Generic Deobfuscation of Windows API Calls

Vadim Kotov,Michael Wojnowicz
DOI: https://doi.org/10.48550/arXiv.1802.04466
2020-12-06
Abstract:A common way to get insight into a malicious program's functionality is to look at which API functions it calls. To complicate the reverse engineering of their programs, malware authors deploy API obfuscation techniques, hiding them from analysts' eyes and anti-malware scanners. This problem can be partially addressed by using dynamic analysis; that is, by executing a malware sample in a controlled environment and logging the API calls. However, malware that is aware of virtual machines and sandboxes might terminate without showing any signs of malicious behavior. In this paper, we introduce a static analysis technique allowing generic deobfuscation of Windows API calls. The technique utilizes symbolic execution and hidden Markov models to predict API names from the arguments passed to the API functions. Our best prediction model can correctly identify API names with 87.60% accuracy.
Cryptography and Security,Applications
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the obfuscation of API calls in malware. Specifically, in order to increase the difficulty of reverse engineering, malware developers will use various techniques to hide or obfuscate API calls in the program, making it difficult for analysts to directly obtain information about these API calls through static analysis. This poses a challenge to anti - malware scanners and security analysts, because understanding which API functions a malicious program calls can provide important clues about its functionality. To solve this problem, the paper proposes a static analysis technique aimed at achieving general de - obfuscation of Windows API calls. This technique uses symbolic execution and Hidden Markov Models (HMMs) to predict API function names. The specific method is to infer the name of the API function from the parameters passed to the API function, thereby bypassing the obfuscation barriers set by malware authors. The main contribution of the paper lies in developing an automated method that can predict the obfuscated API function names with high accuracy by analyzing the parameters passed during API calls. Experimental results show that, in the case where the number of parameters is known, the prediction accuracy of this method is 73.18%, and in the case where the number of parameters is unknown, the accuracy is increased to 87.60%. This result is of great significance for improving the efficiency and accuracy of malware analysis.