Obfuscated Malicious Javascript Detection by Machine Learning

Jinkun Pan,Xiaoguang Mao
DOI: https://doi.org/10.2991/ameii-16.2016.157
2016-01-01
Abstract:In recent years, malicious JavaScript code has become more and more pervasive and been used by attackers to perform their attacks on the Web. To evade the detection of defense measures, various kinds of obfuscation techniques have been applied by the malicious script, taking advantage of the dynamic nature of JavaScript language. In this paper, we propose a new machine-learning based detection approach aiming at defeating such evasion attempts. Dynamic execution traces are recorded to capture all behaviors performed by the malicious script, including the dynamic generated code. Semantic-based deobfuscation is used to simplify the traces to get more concise and more essential instructions. None-ordered and none-concessive trace patterns are extracted from the deobfuscated traces to represent the intrinsic features for malicious scripts. We evaluated our approach with a large number of dataset collected from the Internet. The empirical results demonstrate that our approach is able to detect obfuscated malicious JavaScript code both effectively and efficiently.
What problem does this paper attempt to address?