Multi-language Webshell Detection Based on Abstract Syntax Tree and TreeLSTM

Mengchuan Shang,Xueying Han,Changzhi Zhao,Zelin Cui,Dan Du,Bo Jiang
DOI: https://doi.org/10.1109/cscwd61410.2024.10580271
2024-01-01
Abstract:Webshell is a command execution environment existing in web containers, which is used by attackers to remotely control servers and illegally access website resources. Accurately detecting Webshells is of great significance for maintaining web security. Current research faces several challenges. On the one hand, in order to evade detection, Webshells use a large amount of obfuscation, and existing research methods often use source code or opcode, which cannot fully utilize the semantic and syntactic information of Webshell code. On the other hand, Webshells can be constructed using any web application programming language, while most existing methods only detect one or a few types of Webshells. This paper proposes a novel approach called WS-Tree, which effectively utilizes the semantics and syntax of Webshells by using abstract syntax tree as input features. The TreeLSTM model is used as an encoder to handle node relationships in the syntax tree, thereby achieving the detection of obfuscated and multi-language Webshells. We also propose a new dataset of Webshells containing obfuscated and non-obfuscated to prevent dataset leakage. Extensive experiments demonstrate that our proposed model performs better than the state-of-theart baselines under different webshell programming languages and improves model generalizability.
What problem does this paper attempt to address?