Webshell Detection Technology Based on Deep Learning

Ziheng Zhou,Lin Li,Xuan-Ying Zhao
DOI: https://doi.org/10.1109/BigDataSecurityHPSCIDS52275.2021.00020
2021-05-01
Abstract:In this paper, we use a Deep Learning technique called Long Short Term Memory (LSTM) recurrent neural networks to detect Webshell which is a kind of trojan scripts written by hackers and causes great security risks to web servers. We mainly use deep learning theory to intelligently extract the characteristics of the opcode sequences of malicious codes written by PHP and study the classification model. In this paper, we compile PHP files into opcode sequences, build Webshell detection model by using LSTM, which also includes word embedding conversion, multi-layer LSTM structure and so on. The trained single-layer model finally shows over 95% accuracy for detecting Webshells. However, accuracy of multi-layer models is reduced on the contrary. The double-layer model shows 93% accuracy and the triple-layer model even shows lower than 90% accuracy. It turns out that more layers is not always better probably due to gradient vanishing or overfitting caused by multiple layers. The result indicates that single-layer model may perform best on Webshell detection.
Computer Science
What problem does this paper attempt to address?