Abstract:Software systems often record important runtime information in logs to help with troubleshooting. Log-based anomaly detection has become a key research area that aims to identify system issues through log data, ultimately enhancing the reliability of software systems. Traditional deep learning methods often struggle to capture the semantic information embedded in log data, which is typically organized in natural language. In this paper, we propose LogLLM, a log-based anomaly detection framework that leverages large language models (LLMs). LogLLM employs BERT for extracting semantic vectors from log messages, while utilizing Llama, a transformer decoder-based model, for classifying log sequences. Additionally, we introduce a projector to align the vector representation spaces of BERT and Llama, ensuring a cohesive understanding of log semantics. Unlike conventional methods that require log parsers to extract templates, LogLLM preprocesses log messages with regular expressions, streamlining the entire process. Our framework is trained through a novel three-stage procedure designed to enhance performance and adaptability. Experimental results across four public datasets demonstrate that LogLLM outperforms state-of-the-art methods. Even when handling unstable logs, it effectively captures the semantic meaning of log messages and detects anomalies accurately.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that existing log - based anomaly detection methods have difficulty effectively capturing semantic information when processing system logs. Specifically: 1. **Limitations of traditional deep - learning methods**: - Traditional deep - learning methods (such as LSTM and Transformer) usually have difficulty extracting and understanding the semantic information in log messages in the form of natural language when processing log data. - These methods often rely on log parsers to extract templates, but these parsers perform poorly when dealing with new or unstable logs, resulting in the loss of semantic information. 2. **Challenges in the application of large - scale language models (LLMs)**: - Existing methods using LLMs are mainly divided into prompt - engineering - based and fine - tuning - based methods. Prompt - engineering methods rely on the internal knowledge of LLMs for anomaly detection, but have limited customization capabilities on specific datasets; fine - tuning methods can be optimized for specific datasets, but have deficiencies in semantic understanding and input data format processing. To solve the above problems, this paper proposes a new framework named LogLLM, aiming to improve the log - based anomaly detection effect by combining two different types of LLMs, BERT and Llama. Specific goals include: - **Using BERT to extract semantic vectors**: Extract semantic vectors from pre - processed log messages through BERT to ensure that the semantic information in the logs can be captured. - **Using Llama for classification**: Classify log sequences through Llama to determine whether they are abnormal. - **Introducing a projector to align the representation spaces**: Align the vector representation spaces of BERT and Llama through a projector to ensure semantic consistency between them. - **Simplifying the pre - processing process**: Use regular expressions instead of log parsers for pre - processing to avoid the complexity of template extraction and potential loss of semantic information. Through these improvements, LogLLM can more effectively capture the semantic meaning of log messages and accurately detect anomalies when dealing with unstable logs. Experimental results show that LogLLM outperforms existing state - of - the - art methods on four public datasets.

LogLLM: Log-based Anomaly Detection Using Large Language Models

LLMeLog: an Approach for Anomaly Detection Based on LLM-enriched Log Events

Leveraging Large Language Models and BERT for Log Parsing and Anomaly Detection

Large language models and unsupervised feature learning: implications for log analysis

Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models

Leveraging RAG-Enhanced Large Language Model for Semi-Supervised Log Anomaly Detection

Anomaly Detection on Unstable Logs with GPT Models

Natural Language Processing-based Model for Log Anomaly Detection

High-precision Online Log Parsing with Large Language Models

Benchmarking Large Language Models for Log Analysis, Security, and Interpretation

Large Language Models for Forecasting and Anomaly Detection: A Systematic Literature Review

BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model

Log Anomaly Detection method based on BERT model optimization

Anomaly Detection Model for Log Based on LSTM Network and Variational Autoencoder

Log-based Anomaly Detection Without Log Parsing

LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models

MLog: Mogrifier LSTM-based Log Anomaly Detection Approach Using Semantic Representation

SemLog: A Semantics-based Approach for Anomaly Detection in Big Data System Logs

Adapting Large Language Models to Log Analysis with Interpretable Domain Knowledge

Anomaly Detection of Tabular Data Using LLMs

LogiCode: an LLM-Driven Framework for Logical Anomaly Detection