Abstract:This article puts forward the use of mutual information values to replicate the expertise of security professionals in selecting features for detecting web attacks. The goal is to enhance the effectiveness of web application firewalls (WAFs). Web applications are frequently vulnerable to various security threats, making WAFs essential for their protection. WAFs analyze HTTP traffic using rule-based approaches to identify known attack patterns and to detect and block potential malicious requests. However, a major challenge is the occurrence of false positives, which can lead to blocking legitimate traffic and impact the normal functioning of the application. The problem is addressed as an approach that combines supervised learning for feature selection with a semi-supervised learning scenario for training a One-Class SVM model. The experimental findings show that the model trained with features selected by the proposed algorithm outperformed the expert-based selection approach in terms of performance. Additionally, the results obtained by the traditional rule-based WAF ModSecurity, configured with a vanilla set of OWASP CRS rules, were also improved.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to replicate the expertise of security experts in feature selection by using mutual information values to improve the effectiveness of Web Application Firewalls (WAFs). Specifically, the authors attempt to solve the following problems: 1. **Reduce the false positive rate**: - WAF systems often generate false positives when detecting potentially malicious requests, which blocks legitimate traffic and affects the normal operation of applications. The false positive problem not only degrades the user experience but may also cause business interruptions. 2. **Improve the accuracy of attack detection**: - Web applications face various security threats, and traditional rule - based WAF methods have difficulty dealing with new and complex attack patterns. Therefore, a more effective method is needed to distinguish between legitimate requests and attack requests. 3. **Limitations of relying on expert knowledge**: - Current security experts mainly rely on experience and intuition when selecting features, which may lead to subjective biases and it is difficult to capture emerging attack vectors or subtle attack patterns. Therefore, a data - driven method is needed to objectively identify and prioritize features with high discriminability. 4. **Generality and adaptability**: - Existing WAF solutions usually need to be configured and adjusted for specific applications, which makes them difficult to be applied to other different applications. Therefore, a general method that can handle multiple types of attacks without relying on specific application data is needed. To solve these problems, the authors propose a method that combines supervised learning and semi - supervised learning, uses mutual information values for feature selection, and trains One - Class SVM models. Experimental results show that this method outperforms traditional methods based on expert - selected features in performance and significantly improves the detection effect of traditional rule - based WAFs (such as ModSecurity). ### Main contributions 1. **Introduce a diverse attack dataset**: - Use a general dataset containing multiple attack types without relying on specific application data, simplifying the feature selection process. 2. **Feature selection method based on mutual information values**: - Only use normal application traffic and the above - mentioned attack dataset, select features through mutual information values for training One - Class SVM models. This method performs well in anomaly detection and can more effectively identify the unique features between benign and malicious traffic. Through these improvements, this research not only improves the security of Web applications but also enhances the performance and generality of WAF systems.

Capturing the security expert knowledge in feature selection for web application attack detection

A Comprehensive Evaluation of Machine Learning Algorithms for Web Application Attack Detection with Knowledge Graph Integration

A set of features to detect web security threats

Reliable Feature Selection for Adversarially Robust Cyber-Attack Detection

Automated Network Incident Identification through Genetic Algorithm-Driven Feature Selection

Fuzzy neural networks to create an expert system for detecting attacks by SQL Injection

Using Feature Selection Enhancement to Evaluate Attack Detection in the Internet of Things Environment

Feature Popularity Between Different Web Attacks with Supervised Feature Selection Rankers

An empirical evaluation for the intrusion detection features based on machine learning and feature selection methods

ModSec-AdvLearn: Countering Adversarial SQL Injections with Robust Machine Learning

Adaptively Detecting Malicious Queries in Web Attacks

M-MultiSVM: An efficient feature selection assisted network intrusion detection system using machine learning

Feature selection for intrusion detection systems

A new feature popularity framework for detecting cyberattacks using popular features

Deep Learning Technique-Enabled Web Application Firewall for the Detection of Web Attacks

Effective combining of feature selection techniques for machine learning-enabled IoT intrusion detection

Is feature selection secure against training data poisoning?

An extended assessment of metaheuristics-based feature selection for intrusion detection in CPS perception layer

Leveraging Metaheuristics for Feature Selection With Machine Learning Classification for Malicious Packet Detection in Computer Networks

Using machine learning techniques to identify rare cyber‐attacks on the UNSW‐NB15 dataset