Abstract:In modern IT systems and computer networks, real-time and offline event log analysis is a crucial part of cyber security monitoring. In particular, event log analysis techniques are essential for the timely detection of cyber attacks and for assisting security experts with the analysis of past security incidents. The detection of line patterns or templates from unstructured textual event logs has been identified as an important task of event log analysis since detected templates represent event types in the event log and prepare the logs for downstream online or offline security monitoring tasks. During the last two decades, a number of template mining algorithms have been proposed. However, many proposed algorithms rely on traditional data mining techniques, and the usage of Large Language Models (LLMs) has received less attention so far. Also, most approaches that harness LLMs are supervised, and unsupervised LLM-based template mining remains an understudied area. The current paper addresses this research gap and investigates the application of LLMs for unsupervised detection of templates from unstructured security event logs.

What problem does this paper attempt to address?

This paper attempts to solve the following problems: 1. **Limitations of existing template detection methods**: Most traditional template detection algorithms rely on traditional data mining techniques. These techniques have difficulties in handling hyper - parameter selection, and many algorithms assume that log messages contain the same number of words, which is not always true in practice. In addition, many existing large - language - model - (LLM - ) based methods are supervised and require labeled training data, and creating and updating these data sets is a labor - intensive process. 2. **Under - utilization of LLM for unsupervised template mining**: Although some research has begun to explore the use of LLM for template detection, most work has focused on supervised methods, and research on unsupervised LLM - based template mining is still insufficient. 3. **Security issues of sensitive data**: Security event logs usually contain sensitive information and cannot be shared with external service providers. Therefore, how to use LLM for template detection without revealing sensitive information is an important issue. 4. **Data set problems of existing research**: Existing research often uses outdated or network - security - unrelated data sets, which limits the practical application value of research results. To solve these problems, this paper proposes a new unsupervised LLM - based template mining method (called LLM - TD) and focuses on solving the following aspects: - **Application of local LLM**: To ensure the security of sensitive data, this method uses a local small - scale LLM for processing instead of relying on public LLMs provided by external services. - **Unsupervised learning paradigm**: Different from previous supervised methods that require labeled data, LLM - TD adopts an unsupervised in - context learning paradigm and lets the LLM identify multiple templates by constructing appropriate prompts. - **Efficient processing of diverse log messages**: LLM - TD reduces the diversity of log messages by splitting log data according to applications, thereby improving the accuracy of LLM in identifying templates. - **Batch - processing mechanism**: LLM - TD can process multiple log messages at once to improve efficiency and does not require complex batch - creation methods. In summary, this paper aims to overcome the limitations of existing methods in processing network - security - event logs, especially the challenges in data sensitivity and algorithm performance, by proposing a new unsupervised LLM - based template detection method.

Using Large Language Models for Template Detection from Security Event Logs

The Usage of Template Mining in Log File Classification

Natural Language Processing-based Model for Log Anomaly Detection

High-precision Online Log Parsing with Large Language Models

An Intelligent Framework for Log Anomaly Detection Based on Log Template Extraction

Benchmarking Large Language Models for Log Analysis, Security, and Interpretation

TPLogAD: Unsupervised Log Anomaly Detection Based on Event Templates and Key Parameters

LogLLM: Log-based Anomaly Detection Using Large Language Models

On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions

Recurrent Neural Network Language Models for Open Vocabulary Event-Level Cyber Anomaly Detection

A LSTM-Based Anomaly Detection Model for Log Analysis

TPLAD: Template-Parsed Log Anomaly Detection for Electrical Database Systems

Large Language Models in Cybersecurity: State-of-the-Art

Towards Explainable Network Intrusion Detection using Large Language Models

Large language models and unsupervised feature learning: implications for log analysis

Anomaly Detection in Cyber Security with Graph-Based LSTM in Log Analysis

Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities

Log Analysis For Network Attack Detection Using Deep Learning Models

What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection

Large Language Models for Cyber Security: A Systematic Literature Review