The Use of Large Language Models (LLM) for Cyber Threat Intelligence (CTI) in Cybercrime Forums

Vanessa Clairoux-Trepanier,Isa-May Beauchamp,Estelle Ruellan,Masarah Paquet-Clouston,Serge-Olivier Paquette,Eric Clay
2024-10-01
Abstract:Large language models (LLMs) can be used to analyze cyber threat intelligence (CTI) data from cybercrime forums, which contain extensive information and key discussions about emerging cyber threats. However, to date, the level of accuracy and efficiency of LLMs for such critical tasks has yet to be thoroughly evaluated. Hence, this study assesses the performance of an LLM system built on the OpenAI GPT-3.5-turbo model [8] to extract CTI information. To do so, a random sample of more than 700 daily conversations from three cybercrime forums - XSS, Exploit_in, and RAMP - was extracted, and the LLM system was instructed to summarize the conversations and predict 10 key CTI variables, such as whether a large organization and/or a critical infrastructure is being targeted, with only simple human-language instructions. Then, two coders reviewed each conversation and evaluated whether the information extracted by the LLM was accurate. The LLM system performed well, with an average accuracy score of 96.23%, an average precision of 90% and an average recall of 88.2%. Various ways to enhance the model were uncovered, such as the need to help the LLM distinguish between stories and past events, as well as being careful with verb tenses in prompts. Nevertheless, the results of this study highlight the relevance of using LLMs for cyber threat intelligence.
Cryptography and Security,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to evaluate the accuracy and efficiency of large - language models (LLMs) in extracting and summarizing cyber - threat intelligence (CTI) from cybercrime forums. Specifically, the research aims to answer the following questions: 1. **Can LLMs be accurately used for cyber - threat intelligence?** - The research evaluates the accuracy of LLMs by testing their performance when processing conversational data in cybercrime forums. 2. **Can LLMs replace junior threat analysts?** - Junior threat analysts are usually responsible for reading and extracting relevant information in cybercrime forums. The research explores whether LLMs can reach a level comparable to that of human analysts in such tasks. 3. **What is the cost - effectiveness of using LLMs for CTI?** - The research also considers the cost - effectiveness of using LLMs for CTI, especially compared to traditional machine - learning classifiers, which require a large amount of manually - labeled and curated training data. To answer these questions, the researchers used an LLMs system based on the OpenAI GPT - 3.5 - turbo model and randomly selected more than 700 daily conversation samples from three cybercrime forums (XSS, Exploit.in, and RAMP). Then, they instructed the LLMs system to summarize these conversations and predict 10 key CTI variables, such as whether large organizations or critical infrastructures are involved, etc. Two coders then reviewed each conversation to assess whether the information extracted by the LLMs was accurate. The research results show that the LLMs system performs excellently in extracting CTI information, with an average accuracy rate of 96.23%, an average precision rate of 90%, and an average recall rate of 88.2%. However, the research also points out several areas for improvement, such as helping LLMs distinguish between stories and past events, and paying attention to the verb tenses in prompts. Overall, this research validates the effective application of LLMs in CTI, but also emphasizes the need for further optimization, especially when dealing with complex definitions such as "large organizations" or "critical infrastructures".