Advancing SQL Injection Detection for High-Speed Data Centers: A Novel Approach Using Cascaded NLP

Kasim Tasdemir,Rafiullah Khan,Fahad Siddiqui,Sakir Sezer,Fatih Kurugollu,Sena Busra Yengec-Tasdemir,Alperen Bolat
2023-12-20
Abstract:Detecting SQL Injection (SQLi) attacks is crucial for web-based data center security, but it is challenging to balance accuracy and computational efficiency, especially in high-speed networks. Traditional methods struggle with this balance, while NLP-based approaches, although accurate, are computationally intensive. We introduce a novel cascade SQLi detection method, blending classical and transformer-based NLP models, achieving a 99.86% detection accuracy with significantly lower computational demands-20 times faster than using transformer-based models alone. Our approach is tested in a realistic setting and compared with 35 other methods, including Machine Learning-based and transformer models like BERT, on a dataset of over 30,000 SQL sentences. Our results show that this hybrid method effectively detects SQLi in high-traffic environments, offering efficient and accurate protection against SQLi vulnerabilities with computational efficiency. The code is available at <a class="link-external link-https" href="https://github.com/gdrlab/cascaded-sqli-detection" rel="external noopener nofollow">this https URL</a> .
Cryptography and Security
What problem does this paper attempt to address?
This paper attempts to solve several key problems in SQL injection (SQLi) attack detection, especially the challenges of achieving high - precision and efficient computing in high - speed data center environments. Specifically: 1. **Balancing accuracy and computational efficiency**: Traditional SQLi detection methods find it difficult to strike a balance between high precision and low computational cost. Especially in high - speed network environments with limited computational resources, traditional methods often perform poorly. 2. **Improving the speed and accuracy of SQLi detection**: Existing methods either rely on static rule - matching, which is prone to false positives and unable to identify new attack patterns; or rely on computationally - intensive Transformer models, which are accurate but slow. Therefore, a method that can maintain high detection accuracy while significantly reducing computational overhead is required. 3. **Adapting to the complex requirements of modern data centers**: Modern data centers and data processing units (DPUs) face huge network traffic, and existing SQLi detection techniques are difficult to operate efficiently in such environments. Therefore, a solution that can be seamlessly integrated into complex infrastructures needs to be designed to support real - time detection and mitigation of SQLi attacks. ### Main contributions of the paper To solve the above problems, this paper proposes a novel cascaded SQLi detection method, combining classical machine learning and Transformer - based natural language processing (NLP) techniques. Specific contributions include: - **Proposing a unique cascaded SQLi detection model**: This model combines classical machine learning classifiers and advanced Transformer - based NLP techniques, and can significantly improve the detection speed while maintaining high detection accuracy. - **Introducing and implementing the F1 - efficiency (FE) metric**: This is a new performance evaluation metric that comprehensively considers detection accuracy and inference speed, allowing dynamic adjustment of the model configuration according to user needs. - **Comparing 35 different methods in detail**: Including machine learning classifiers and Transformer - based models, evaluating their classification performance and speed, providing in - depth understanding of the performance dynamics of different methods. - **Integrating multiple ensemble models**: Combining classical NLP features and machine learning models for testing, enriching the experimental setup and increasing the diversity of results. ### Method overview The cascaded SQLi detection system proposed in this paper is divided into two stages: - **First stage**: Use a fast classical machine learning classifier (such as Passive Aggressive Classifier) to quickly screen out most non - malicious samples and reduce the computational burden. - **Second stage**: Pass the samples marked as suspicious in the first stage to the Transformer - based model for re - analysis to further reduce false positives. Through this two - stage design, this system can not only maintain high detection accuracy (99.86%), but also increase the inference speed by 20 times, thus effectively dealing with SQLi attacks in high - speed data center environments. In short, this paper aims to provide an efficient and accurate SQLi detection solution, especially suitable for high - speed network environments with limited computational resources.