Using Large Language Models for Template Detection from Security Event Logs

Risto Vaarandi,Hayretdin Bahsi
DOI: https://doi.org/10.48550/arXiv.2409.05045
2024-09-08
Abstract:In modern IT systems and computer networks, real-time and offline event log analysis is a crucial part of cyber security monitoring. In particular, event log analysis techniques are essential for the timely detection of cyber attacks and for assisting security experts with the analysis of past security incidents. The detection of line patterns or templates from unstructured textual event logs has been identified as an important task of event log analysis since detected templates represent event types in the event log and prepare the logs for downstream online or offline security monitoring tasks. During the last two decades, a number of template mining algorithms have been proposed. However, many proposed algorithms rely on traditional data mining techniques, and the usage of Large Language Models (LLMs) has received less attention so far. Also, most approaches that harness LLMs are supervised, and unsupervised LLM-based template mining remains an understudied area. The current paper addresses this research gap and investigates the application of LLMs for unsupervised detection of templates from unstructured security event logs.
Cryptography and Security
What problem does this paper attempt to address?
This paper attempts to solve the following problems: 1. **Limitations of existing template detection methods**: Most traditional template detection algorithms rely on traditional data mining techniques. These techniques have difficulties in handling hyper - parameter selection, and many algorithms assume that log messages contain the same number of words, which is not always true in practice. In addition, many existing large - language - model - (LLM - ) based methods are supervised and require labeled training data, and creating and updating these data sets is a labor - intensive process. 2. **Under - utilization of LLM for unsupervised template mining**: Although some research has begun to explore the use of LLM for template detection, most work has focused on supervised methods, and research on unsupervised LLM - based template mining is still insufficient. 3. **Security issues of sensitive data**: Security event logs usually contain sensitive information and cannot be shared with external service providers. Therefore, how to use LLM for template detection without revealing sensitive information is an important issue. 4. **Data set problems of existing research**: Existing research often uses outdated or network - security - unrelated data sets, which limits the practical application value of research results. To solve these problems, this paper proposes a new unsupervised LLM - based template mining method (called LLM - TD) and focuses on solving the following aspects: - **Application of local LLM**: To ensure the security of sensitive data, this method uses a local small - scale LLM for processing instead of relying on public LLMs provided by external services. - **Unsupervised learning paradigm**: Different from previous supervised methods that require labeled data, LLM - TD adopts an unsupervised in - context learning paradigm and lets the LLM identify multiple templates by constructing appropriate prompts. - **Efficient processing of diverse log messages**: LLM - TD reduces the diversity of log messages by splitting log data according to applications, thereby improving the accuracy of LLM in identifying templates. - **Batch - processing mechanism**: LLM - TD can process multiple log messages at once to improve efficiency and does not require complex batch - creation methods. In summary, this paper aims to overcome the limitations of existing methods in processing network - security - event logs, especially the challenges in data sensitivity and algorithm performance, by proposing a new unsupervised LLM - based template detection method.