DeepSQLi: deep semantic learning for testing SQL injection

Muyang Liu,Ke Li,Tao Chen
DOI: https://doi.org/10.1145/3395363.3397375
2020-07-13
Abstract:Security is unarguably the most serious concern for Web applications, to which SQL injection (SQLi) attack is one of the most devastating attacks. Automatically testing SQLi vulnerabilities is of ultimate importance, yet is unfortunately far from trivial to implement. This is because the existence of a huge, or potentially infinite, number of variants and semantic possibilities of SQL leading to SQLi attacks on various Web applications. In this paper, we propose a deep natural language processing based tool, dubbed DeepSQLi, to generate test cases for detecting SQLi vulnerabilities. Through adopting deep learning based neural language model and sequence of words prediction, DeepSQLi is equipped with the ability to learn the semantic knowledge embedded in SQLi attacks, allowing it to translate user inputs (or a test case) into a new test case, which is se- mantically related and potentially more sophisticated. Experiments are conducted to compare DeepSQLi with SQLmap, a state-of-the-art SQLi testing automation tool, on six real-world Web applications that are of different scales, characteristics and domains. Empirical results demonstrate the effectiveness and the remarkable superiority of DeepSQLi over SQLmap, such that more SQLi vulnerabilities can be identified by using a less number of test cases, whilst running much faster.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the automatic detection of SQL injection (SQLi) vulnerabilities in Web applications. Specifically, traditional SQLi testing methods face the following challenges: 1. **Complexity of SQL Semantics**: SQL injection attacks have diverse forms and complex semantics, making it difficult to comprehensively cover them with simple rules or a limited number of test cases. 2. **Difficulty in Automatically Generating Test Cases**: Existing automated tools (such as SQLmap) can generate some test cases, but the number of test cases they generate is large and the efficiency is low, making it difficult to efficiently discover new vulnerabilities. 3. **Insufficient Utilization of Semantic Knowledge**: Traditional methods mainly rely on software engineers to manually specify rules to generate test cases, which has limited flexibility and high cost. To solve these problems, the paper proposes a tool based on deep natural language processing (Deep NLP) - DeepSQLi. This tool can automatically learn the semantic knowledge of SQL injection attacks and generate various semantically - related and maliciously effective test cases. In this way, DeepSQLi can detect SQL injection vulnerabilities more effectively, while reducing the number of required test cases and increasing the running speed. ### Main Contributions 1. **End - to - End Automated Tool**: DeepSQLi is a fully automated end - to - end tool, trained using a customized neural language model (based on the Transformer architecture). 2. **Rich Training Data Set**: To help the model learn semantic knowledge, five mutation operators have been developed to enrich the training data set. 3. **Diverse Test Case Generation**: By expanding the Beam search algorithm of the neural language model, DeepSQLi can generate multiple semantically - related test cases, thereby increasing the chances of finding vulnerabilities. 4. **Effectiveness Proven by Experiments**: Experiments were carried out on six real - world Web applications with different scales and characteristics. The results show that DeepSQLi is superior to the existing automated tool SQLmap in terms of the number of detected vulnerabilities and utilization rate, and at the same time, the running speed is increased by 6 times. ### Innovation Points 1. **Semantic Translation Ability**: DeepSQLi can translate normal user input into malicious input to form test cases; and it can also translate existing test cases into new, more complex test cases. 2. **Continuous Adaptability**: If the generated test cases cannot successfully execute SQL injection attacks, these test cases will be fed back to the neural language model to continuously optimize the model and increase the chances of finding unknown and deeply hidden vulnerabilities. In conclusion, this paper significantly improves the efficiency and effectiveness of SQL injection vulnerability detection by introducing deep learning and natural language processing technologies.