JStrong: Malicious JavaScript detection based on code semantic representation and graph neural network

Yong Fang,Chaoyi Huang,Minchuan Zeng,Zhiying Zhao,Cheng Huang
DOI: https://doi.org/10.1016/j.cose.2022.102715
2022-07-01
Abstract:Web development technology has experienced significant progress. The creation of JavaScript has highly enriched the interactive ability of the client. However, the attacker uses the dynamic characteristics of the JavaScript language to embed malicious code into web pages to achieve the purpose of smuggling, redirection, and so on. Traditional methods based on static feature detection are therefore difficult to detect malicious code after confusion, and the method based on dynamic analysis is inefficient. To meet these challenges, this paper proposes a static detection model JStrong based on graph neural network. The model first generates an abstract syntax tree from the JavaScript source code, and then adds data flow and control flow information into the program dependency graph. In addition, we embed the nodes and edges of the graph into the feature vector and fully learn the features of the whole graph through the graph neural network. We take advantage of a real-world dataset collected from the top website and GitHub to evaluate JStrong and compare it to the state-of-the-art method. Experimental results show that JStrong achieves near-perfect classification performance and is superior to the state-of-the-art method.
computer science, information systems
What problem does this paper attempt to address?