Offensive AI: Enhancing Directory Brute-forcing Attack with the Use of Language Models

Alberto Castagnaro,Mauro Conti,Luca Pajola
2024-04-22
Abstract:Web Vulnerability Assessment and Penetration Testing (Web VAPT) is a comprehensive cybersecurity process that uncovers a range of vulnerabilities which, if exploited, could compromise the integrity of web applications. In a VAPT, it is common to perform a \textit{Directory brute-forcing Attack}, aiming at the identification of accessible directories of a target website. Current commercial solutions are inefficient as they are based on brute-forcing strategies that use wordlists, resulting in enormous quantities of trials for a small amount of success. Offensive AI is a recent paradigm that integrates AI-based technologies in cyber attacks. In this work, we explore whether AI can enhance the directory enumeration process and propose a novel Language Model-based framework. Our experiments -- conducted in a testbed consisting of 1 million URLs from different web application domains (universities, hospitals, government, companies) -- demonstrate the superiority of the LM-based attack, with an average performance increase of 969%.
Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that current commercial solutions are inefficient when conducting Directory Brute - forcing Attacks. Existing methods rely on using wordlists for brute - forcing, resulting in a large number of attempts but only a small number of successful results. Specifically, these traditional methods generate a large number of URL requests based on predefined wordlists, but have a low success rate and cannot adapt to different types of Web application structures. To solve this problem, the paper proposes a new framework based on the Language Model (LM), aiming to enhance the efficiency and effectiveness of the directory enumeration process by leveraging AI technology. Specific objectives include: 1. **Improve attack efficiency**: By using the language model to generate URL paths that are more likely to succeed, reducing the number of invalid attempts. 2. **Adapt to different types of applications**: Dynamically generate effective paths for specific types of Web applications based on knowledge extracted from similar websites. 3. **Adaptive decision - making**: During the attack process, dynamically adjust the generated URLs according to the discovered effective paths to maximize the hit rate and reduce unnecessary requests. ### Specific problem description #### Limitations of traditional methods - **Inefficiency**: Traditional methods rely on predefined wordlists to generate a large number of URL requests, but have a low success rate. - **Lack of adaptability**: Traditional methods cannot be adjusted according to the specific structure of Web applications, resulting in a large number of invalid attempts. #### Advantages of the new method - **High efficiency**: Generate URL paths that are more likely to succeed based on the language model, significantly improving the attack efficiency. - **Adaptability**: By leveraging prior knowledge, it can better adapt to different types of Web application structures. - **Adaptive decision - making**: Dynamically adjust the generated URLs according to the discovered effective paths, further improving the attack success rate. ### Research contributions The main contributions of the paper include: 1. Designing a new dataset containing four different types of applications (commercial, government, hospital, university), with a total of 1 million URLs. 2. Proposing two new types of directory brute - forcing attack methods based on prior knowledge: one is a probabilistic method, and the other is a method based on the language model. 3. Systematic evaluation shows that the attack method based on the language model far outperforms eight baseline methods in performance, with an average performance improvement of 969%. Through these improvements, the paper demonstrates how to use AI technology to significantly improve the efficiency and effectiveness of directory brute - forcing attacks, thus providing new ideas and methods for research in the field of network security.