Abstract:Large language models (LLMs) can be used to analyze cyber threat intelligence (CTI) data from cybercrime forums, which contain extensive information and key discussions about emerging cyber threats. However, to date, the level of accuracy and efficiency of LLMs for such critical tasks has yet to be thoroughly evaluated. Hence, this study assesses the performance of an LLM system built on the OpenAI GPT-3.5-turbo model [8] to extract CTI information. To do so, a random sample of more than 700 daily conversations from three cybercrime forums - XSS, Exploit_in, and RAMP - was extracted, and the LLM system was instructed to summarize the conversations and predict 10 key CTI variables, such as whether a large organization and/or a critical infrastructure is being targeted, with only simple human-language instructions. Then, two coders reviewed each conversation and evaluated whether the information extracted by the LLM was accurate. The LLM system performed well, with an average accuracy score of 96.23%, an average precision of 90% and an average recall of 88.2%. Various ways to enhance the model were uncovered, such as the need to help the LLM distinguish between stories and past events, as well as being careful with verb tenses in prompts. Nevertheless, the results of this study highlight the relevance of using LLMs for cyber threat intelligence.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to evaluate the accuracy and efficiency of large - language models (LLMs) in extracting and summarizing cyber - threat intelligence (CTI) from cybercrime forums. Specifically, the research aims to answer the following questions: 1. **Can LLMs be accurately used for cyber - threat intelligence?** - The research evaluates the accuracy of LLMs by testing their performance when processing conversational data in cybercrime forums. 2. **Can LLMs replace junior threat analysts?** - Junior threat analysts are usually responsible for reading and extracting relevant information in cybercrime forums. The research explores whether LLMs can reach a level comparable to that of human analysts in such tasks. 3. **What is the cost - effectiveness of using LLMs for CTI?** - The research also considers the cost - effectiveness of using LLMs for CTI, especially compared to traditional machine - learning classifiers, which require a large amount of manually - labeled and curated training data. To answer these questions, the researchers used an LLMs system based on the OpenAI GPT - 3.5 - turbo model and randomly selected more than 700 daily conversation samples from three cybercrime forums (XSS, Exploit.in, and RAMP). Then, they instructed the LLMs system to summarize these conversations and predict 10 key CTI variables, such as whether large organizations or critical infrastructures are involved, etc. Two coders then reviewed each conversation to assess whether the information extracted by the LLMs was accurate. The research results show that the LLMs system performs excellently in extracting CTI information, with an average accuracy rate of 96.23%, an average precision rate of 90%, and an average recall rate of 88.2%. However, the research also points out several areas for improvement, such as helping LLMs distinguish between stories and past events, and paying attention to the verb tenses in prompts. Overall, this research validates the effective application of LLMs in CTI, but also emphasizes the need for further optimization, especially when dealing with complex definitions such as "large organizations" or "critical infrastructures".

The Use of Large Language Models (LLM) for Cyber Threat Intelligence (CTI) in Cybercrime Forums

On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions

Using Large Language Models for Cybersecurity Capture-The-Flag Challenges and Certification Questions

A Comprehensive Overview of Large Language Models (LLMs) for Cyber Defences: Opportunities and Directions

Large Language Models in Cybersecurity: State-of-the-Art

Large Language Models for Cyber Security: A Systematic Literature Review

Assessing Large Language Model’s knowledge of threat behavior in MITRE ATT&CK

On Large Language Models in National Security Applications

Assessing the Impact of Conspiracy Theories Using Large Language Models

Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models

From Text to MITRE Techniques: Exploring the Malicious Use of Large Language Models for Generating Cyber Attack Payloads

Efficacy of Utilizing Large Language Models to Detect Public Threat Posted Online

A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly

CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

LLMs Killed the Script Kiddie: How Agents Supported by Large Language Models Change the Landscape of Network Threat Testing

Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve

Advancing TTP Analysis: Harnessing the Power of Large Language Models with Retrieval Augmented Generation

Beyond Words: On Large Language Models Actionability in Mission-Critical Risk Analysis

Large Language Models for Automatic Detection of Sensitive Topics