Assessing Large Language Model’s knowledge of threat behavior in MITRE ATT&CK

Erik Hemberg,Stephen Moskal,Ethan Garza,Una-May O’Reilly
Abstract:In the rapidly evolving field of cyber defense, acquiring expertise on threat behavior and mitigations is both time-consuming and non-trivial. This paper investigates the knowledge of threat behavior in MITRE ATT&CK exhibited by GPT-3.5, a Large Language Model (LLM). We systematically explore different input prompts to generate questions and assess the number of correct questions based on Subject Matter Expert (SME) and LLM evaluation. We analyze various prompts to elicit accurate responses to these questions from a set of LLMs. Our findings indicate that LLMs can generate questions and answers about threat behaviors and mitigations. However, GPT-3.5 may struggle to rate the quality of the generated questions. This study contributes to the understanding of LLM knowledge, capacity, and risks in the cyber security domain. It also highlights their potential applications for assessing cyber security knowledge.
Computer Science
What problem does this paper attempt to address?