Capturing the security expert knowledge in feature selection for web application attack detection

Amanda Riverol,Gustavo Betarte,Rodrigo Martínez,Álvaro Pardo
2024-07-26
Abstract:This article puts forward the use of mutual information values to replicate the expertise of security professionals in selecting features for detecting web attacks. The goal is to enhance the effectiveness of web application firewalls (WAFs). Web applications are frequently vulnerable to various security threats, making WAFs essential for their protection. WAFs analyze HTTP traffic using rule-based approaches to identify known attack patterns and to detect and block potential malicious requests. However, a major challenge is the occurrence of false positives, which can lead to blocking legitimate traffic and impact the normal functioning of the application. The problem is addressed as an approach that combines supervised learning for feature selection with a semi-supervised learning scenario for training a One-Class SVM model. The experimental findings show that the model trained with features selected by the proposed algorithm outperformed the expert-based selection approach in terms of performance. Additionally, the results obtained by the traditional rule-based WAF ModSecurity, configured with a vanilla set of OWASP CRS rules, were also improved.
Cryptography and Security,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to replicate the expertise of security experts in feature selection by using mutual information values to improve the effectiveness of Web Application Firewalls (WAFs). Specifically, the authors attempt to solve the following problems: 1. **Reduce the false positive rate**: - WAF systems often generate false positives when detecting potentially malicious requests, which blocks legitimate traffic and affects the normal operation of applications. The false positive problem not only degrades the user experience but may also cause business interruptions. 2. **Improve the accuracy of attack detection**: - Web applications face various security threats, and traditional rule - based WAF methods have difficulty dealing with new and complex attack patterns. Therefore, a more effective method is needed to distinguish between legitimate requests and attack requests. 3. **Limitations of relying on expert knowledge**: - Current security experts mainly rely on experience and intuition when selecting features, which may lead to subjective biases and it is difficult to capture emerging attack vectors or subtle attack patterns. Therefore, a data - driven method is needed to objectively identify and prioritize features with high discriminability. 4. **Generality and adaptability**: - Existing WAF solutions usually need to be configured and adjusted for specific applications, which makes them difficult to be applied to other different applications. Therefore, a general method that can handle multiple types of attacks without relying on specific application data is needed. To solve these problems, the authors propose a method that combines supervised learning and semi - supervised learning, uses mutual information values for feature selection, and trains One - Class SVM models. Experimental results show that this method outperforms traditional methods based on expert - selected features in performance and significantly improves the detection effect of traditional rule - based WAFs (such as ModSecurity). ### Main contributions 1. **Introduce a diverse attack dataset**: - Use a general dataset containing multiple attack types without relying on specific application data, simplifying the feature selection process. 2. **Feature selection method based on mutual information values**: - Only use normal application traffic and the above - mentioned attack dataset, select features through mutual information values for training One - Class SVM models. This method performs well in anomaly detection and can more effectively identify the unique features between benign and malicious traffic. Through these improvements, this research not only improves the security of Web applications but also enhances the performance and generality of WAF systems.