Abstract:Log-based anomaly detection has been widely studied in the literature as a way to increase the dependability of software-intensive systems. In reality, logs can be unstable due to changes made to the software during its evolution. This, in turn, degrades the performance of downstream log analysis activities, such as anomaly detection. The critical challenge in detecting anomalies on these unstable logs is the lack of information about the new logs, due to insufficient log data from new software versions. The application of Large Language Models (LLMs) to many software engineering tasks has revolutionized various domains. In this paper, we report on an experimental comparison of a fine-tuned LLM and alternative models for anomaly detection on unstable logs. The main motivation is that the pre-training of LLMs on vast datasets may enable a robust understanding of diverse patterns and contextual information, which can be leveraged to mitigate the data insufficiency issue in the context of software evolution. Our experimental results on the two-version dataset of LOGEVOL-Hadoop show that the fine-tuned LLM (GPT-3) fares slightly better than supervised baselines when evaluated on unstable logs. The difference between GPT-3 and other supervised approaches tends to become more significant as the degree of changes in log sequences increases. However, it is unclear whether the difference is practically significant in all cases. Lastly, our comparison of prompt engineering (with GPT-4) and fine-tuning reveals that the latter provides significantly superior performance on both stable and unstable logs, offering valuable insights into the effective utilization of LLMs in this domain.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is: during the evolution of software systems, the instability of logs (unstable logs) caused by changes in the structure and content of logs leads to a decline in the performance of existing log - based anomaly detection methods. Specifically, when software is updated, logs may change as follows: - **Log Template Level**: Log templates may be added, deleted, or modified. - **Log Sequence Level**: The order of log messages may change, and some log templates may be added or removed. These changes will cause the logs generated by the new version of the software to be different from those of the old version, so that the anomaly detection model trained on the old - version logs performs poorly when processing the new - version logs. Therefore, this research aims to explore how to use large - language models (LLMs) to deal with this problem of insufficient data, especially in the context of software evolution. ### Research Background 1. **Log Instability Problem**: - Software systems are constantly evolving, and logs also change accordingly. - These changes make logs unstable and affect the performance of downstream log analysis tasks (such as anomaly detection). - The main challenge is the lack of new - version log data, which makes it difficult to retrain or adjust existing models. 2. **Application of Large - Language Models**: - LLMs have learned a large number of text patterns and context information during the pre - training stage. - This ability may help alleviate the problem of insufficient data brought by software evolution and improve the robustness of anomaly detection. ### Solutions The paper proposes two strategies for using LLMs to deal with the log instability problem: 1. **Fine - tuning**: - Use specific log data to fine - tune the LLM to make it adapt to a specific task (i.e., ADUL). - In this way, the model can better understand the patterns and context information in the logs. 2. **Prompt Engineering**: - Construct effective prompts and input them into the pre - trained LLM for anomaly detection. - Prompts usually include task descriptions, expected inputs/outputs, and examples of related tasks. ### Experimental Results Through experiments on two public datasets (LOGEVOL - Hadoop and HDFS) and a synthetic dataset (SynHDFS), the authors draw the following conclusions: 1. **The performance of fine - tuning GPT - 3 on unstable logs is slightly better than that of the supervised baseline method**. 2. **As the degree of log change increases, the difference between fine - tuning GPT - 3 and other supervised methods becomes more significant**. 3. **Fine - tuning GPT - 3 outperforms prompt - engineering GPT - 4 on both stable and unstable logs, which provides valuable insights for effectively using LLMs for ADUL**. ### Summary The main contribution of this paper lies in exploring the application of LLMs in handling anomaly detection of unstable logs, especially by comparing the two strategies of fine - tuning and prompt engineering, verifying the superiority of fine - tuning in this task. This provides an important reference for future research and practical applications.

Anomaly Detection on Unstable Logs with GPT Models

LogGPT: Log Anomaly Detection via GPT

LogLLM: Log-based Anomaly Detection Using Large Language Models

LLMeLog: an Approach for Anomaly Detection Based on LLM-enriched Log Events

LogGPT: Exploring ChatGPT for Log-Based Anomaly Detection

Pinpointing Anomaly Events in Logs from Stability Testing -- N-Grams vs. Deep-Learning

Anomaly Detection of Tabular Data Using LLMs

Natural Language Processing-based Model for Log Anomaly Detection

Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models

Log-based Anomaly Detection based on EVT Theory with feedback

An Anomaly Detection Approach of Part-of-Speech Log Sequence Via Population Based Training

Leveraging Large Language Models and BERT for Log Parsing and Anomaly Detection

OneLog: Towards End-to-End Training in Software Log Anomaly Detection

MLog: Mogrifier LSTM-based Log Anomaly Detection Approach Using Semantic Representation

Log Anomaly Detection method based on BERT model optimization

Can LLMs Serve As Time Series Anomaly Detectors?

Robust Log-Based Anomaly Detection on Unstable Log Data

OMLog: Online Log Anomaly Detection for Evolving System with Meta-learning

Benchmarking Large Language Models for Log Analysis, Security, and Interpretation

Large language models and unsupervised feature learning: implications for log analysis

Anomaly Detection Model for Log Based on LSTM Network and Variational Autoencoder