A Classification-by-Retrieval Framework for Few-Shot Anomaly Detection to Detect API Injection Attacks

Udi Aharon,Ran Dubin,Amit Dvir,Chen Hajaj
2024-09-15
Abstract:Application Programming Interface (API) Injection attacks refer to the unauthorized or malicious use of APIs, which are often exploited to gain access to sensitive data or manipulate online systems for illicit purposes. Identifying actors that deceitfully utilize an API poses a demanding problem. Although there have been notable advancements and contributions in the field of API security, there remains a significant challenge when dealing with attackers who use novel approaches that don't match the well-known payloads commonly seen in attacks. Also, attackers may exploit standard functionalities unconventionally and with objectives surpassing their intended boundaries. Thus, API security needs to be more sophisticated and dynamic than ever, with advanced computational intelligence methods, such as machine learning models that can quickly identify and respond to abnormal behavior. In response to these challenges, we propose a novel unsupervised few-shot anomaly detection framework composed of two main parts: First, we train a dedicated generic language model for API based on FastText embedding. Next, we use Approximate Nearest Neighbor search in a classification-by-retrieval approach. Our framework allows for training a fast, lightweight classification model using only a few examples of normal API requests. We evaluated the performance of our framework using the CSIC 2010 and ATRDF 2023 datasets. The results demonstrate that our framework improves API attack detection accuracy compared to the state-of-the-art (SOTA) unsupervised anomaly detection baselines.
Cryptography and Security
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the detection of API injection attacks, especially the detection of zero - day vulnerabilities and unknown attack patterns. Specifically, the paper proposes a new unsupervised few - shot anomaly detection framework to address the following challenges: 1. **Detection of unknown attacks (zero - day vulnerabilities)**: Traditional security solutions such as web application firewalls lack the ability to identify unknown vulnerabilities, so a method that can quickly adapt to new attack patterns is required. 2. **Reducing false positives**: In practical applications, false positives will reduce the credibility of the system, so the accuracy of detection needs to be improved. 3. **Real - time and continuous protection**: With the development of the API economy, the frequency and complexity of API interface usage are constantly increasing, requiring the detection system to have real - time and continuous protection capabilities. ### Core contributions of the paper To address the above challenges, the paper proposes a classification and retrieval framework (FT - ANN) based on FastText embedding and approximate nearest neighbor search (ANN). Its main features include: 1. **Novel unsupervised few - shot anomaly detection framework**: By training the model with a small number of normal API request samples, it can make accurate predictions when encountering new, unseen samples. 2. **Specially designed tokenizer**: A tokenizer is designed for the unique language structure of APIs, which can better capture and emphasize the key features in API requests, solving the unique challenges of API natural language processing. 3. **Efficient classification and retrieval method**: Using FastText embedding and ANN search, a lightweight classification model is constructed, reducing the number of required models and supporting incremental index updates. 4. **Domain - independent language model**: A domain - independent language model is designed, enabling the model to switch seamlessly between different API domains without the need for retraining, improving flexibility and generalization ability. ### Experimental verification The paper uses public HTTP datasets (such as CSIC 2010 and ATRDF 2023) for experimental verification. The results show that this framework is superior to existing unsupervised anomaly detection benchmark methods in the accuracy of API attack detection. ### Formula representation The formulas involved in the paper, such as the cosine similarity calculation formula, are as follows: \[ \text{Cosine Similarity} = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \|\mathbf{B}\|} \] where $\mathbf{A}$ and $\mathbf{B}$ are two vectors, $\mathbf{A} \cdot \mathbf{B}$ represents the vector dot product, and $\|\mathbf{A}\|$ and $\|\mathbf{B}\|$ represent the magnitudes of the vectors respectively. In addition, for the maximum distance scaling formula: \[ X' = 1 - \left( \frac{X}{\max(X)} \right) \] where $X$ is the similarity score and $X'$ is the normalized score. Through these methods, the paper provides an efficient and accurate API injection attack detection scheme suitable for modern complex API environments.