Research and application of artificial intelligence based webshell detection model: A literature review

Mingrui Ma,Lansheng Han,Chunjie Zhou
2024-04-28
Abstract:Webshell, as the "culprit" behind numerous network attacks, is one of the research hotspots in the field of cybersecurity. However, the complexity, stealthiness, and confusing nature of webshells pose significant challenges to the corresponding detection schemes. With the rise of Artificial Intelligence (AI) technology, researchers have started to apply different intelligent algorithms and neural network architectures to the task of webshell detection. However, the related research still lacks a systematic and standardized methodological process, which is confusing and redundant. Therefore, following the development timeline, we carefully summarize the progress of relevant research in this field, dividing it into three stages: Start Stage, Initial Development Stage, and In-depth Development Stage. We further elaborate on the main characteristics and core algorithms of each stage. In addition, we analyze the pain points and challenges that still exist in this field and predict the future development trend of this field from our point of view. To the best of our knowledge, this is the first review that details the research related to AI-based webshell detection. It is also hoped that this paper can provide detailed technical information for more researchers interested in AI-based webshell detection tasks.
Cryptography and Security,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This literature review paper aims to solve the problem of the lack of systematization and standardization in WebShell detection methods based on artificial intelligence (AI) technology. Specifically: 1. **Complexity and Concealment of WebShell**: As the "culprit" in cyber - attacks, WebShell has a complex structure, high concealment and obfuscation characteristics, which makes it difficult for traditional rule - or signature - based detection tools to effectively identify. 2. **Deficiencies in Existing Research**: Although many studies have attempted to apply different intelligent algorithms and neural network architectures to the WebShell detection task, these studies still lack a systematic and standardized methodological process, resulting in redundant and unclear research results. 3. **Challenges of Detection Methods**: Existing WebShell detection methods have many challenges in feature extraction, data set balance, model generalization ability, etc., especially when dealing with large - scale data and new unknown WebShells. ### Main Objectives of the Paper To address the above problems, this paper reviews and summarizes the application progress of AI technology in the field of WebShell detection, divides it into three development stages (the initial stage, the preliminary development stage, and the in - depth development stage), and analyzes in detail the main characteristics and core algorithms of each stage. In addition, this paper also points out the pain points and challenges in current research and predicts future development trends. ### Specific Content - **Initial Stage**: It mainly explores the initial exploration of AI - related algorithms in WebShell detection, including the use of simple convolutional neural networks (CNN), character - level methods, etc. - **Preliminary Development Stage**: Since 2019, the research has entered a rapid development stage, and more optimized AI methods have been applied to WebShell detection, such as neural networks combined with attention mechanisms, the fusion of deep learning and traditional methods, etc. - **In - depth Development Stage**: Since the end of 2021, with the rise of BERT and its variants (such as CodeBERT), WebShell detection methods have entered a deeper research stage, involving the application of more complex neural network architectures and pre - training models. ### Conclusion This paper hopes to provide detailed references and technical information for subsequent research through systematic review and summary, and promote the further development of AI technology in the field of WebShell detection.