LLMs are One-Shot URL Classifiers and Explainers

Fariza Rashid,Nishavi Ranaweera,Ben Doyle,Suranga Seneviratne
2024-09-22
Abstract:Malicious URL classification represents a crucial aspect of cyber security. Although existing work comprises numerous machine learning and deep learning-based URL classification models, most suffer from generalisation and domain-adaptation issues arising from the lack of representative training datasets. Furthermore, these models fail to provide explanations for a given URL classification in natural human language. In this work, we investigate and demonstrate the use of Large Language Models (LLMs) to address this issue. Specifically, we propose an LLM-based one-shot learning framework that uses Chain-of-Thought (CoT) reasoning to predict whether a given URL is benign or phishing. We evaluate our framework using three URL datasets and five state-of-the-art LLMs and show that one-shot LLM prompting indeed provides performances close to supervised models, with GPT 4-Turbo being the best model, followed by Claude 3 Opus. We conduct a quantitative analysis of the LLM explanations and show that most of the explanations provided by LLMs align with the post-hoc explanations of the supervised classifiers, and the explanations have high readability, coherency, and informativeness.
Artificial Intelligence
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are two key challenges in phishing URL classification: generalization ability and interpretability. Specifically: 1. **Generalization (Generalisation)**: - Existing machine - learning and deep - learning models experience a significant performance decline when dealing with test URLs from different data sources. This is mainly due to the lack of representativeness in the training data set, making it difficult for the model to adapt to new, unseen data. - This problem of insufficient generalization ability stems from the inherent bias in the data set. For example, the data sets of some organizations are biased towards the URLs frequently visited by their employees. 2. **Interpretability (Explainability)**: - Existing URL classification models are usually black - box models and cannot provide natural - language explanations to illustrate why a certain URL is classified as benign or malicious. Such models lacking interpretability make it difficult for users to understand the basis for classification and reduce user trust. - Providing easy - to - understand explanations is crucial for enhancing users' security awareness, especially in the face of a high false - positive rate. To address these issues, the paper proposes a one - shot learning framework based on large - language models (LLM). This framework utilizes Chain - of - Thought (CoT) reasoning to predict whether a URL is phishing and provides a natural - language explanation for each classification. In this way, the paper aims to improve the generalization ability and interpretability of URL classification, thereby better protecting users from phishing attacks. ### The specific contributions of the paper include: - **Proposing a framework based on LLM** that combines CoT reasoning and one - shot learning for phishing URL classification and demonstrates the ability of LLM as an interpretable one - shot classifier. - **Evaluating five state - of - the - art LLMs** and three different phishing URL data sets, and comparing the performance of the framework with existing supervised URL classifiers. - **Demonstrating its performance in one - shot and zero - sample settings**, where GPT - 4 Turbo achieved an average F1 score of 0.92 in the one - shot setting, only 0.07 points lower than the fully - supervised setting. - **Verifying the interpretability of the classification framework**, by comparing the self - explanations of LLM with the post - hoc explanations obtained in the supervised setting, and evaluating the correctness and language quality of the self - explanations of LLM. - **Analyzing the consistency of LLM predictions** as well as the performance in zero - sample and few - sample settings. The results show that increasing the number of examples has little impact on prediction accuracy. Through these contributions, the paper not only improves the accuracy and generalization ability of URL classification but also enhances the interpretability of the classification results, enabling users to better understand and trust the classification results.