Improving Log-Based Anomaly Detection by Pre-Training Hierarchical Transformers

Shaohan Huang,Yi Liu,Carol Fung,He Wang,Hailong Yang,Zhongzhi Luan

DOI: https://doi.org/10.1109/tc.2023.3257518

IF: 3.183

2023-01-01

IEEE Transactions on Computers

Abstract:Pre-trained models, such as BERT, have resulted in significant pre-trained models, such as BERT, have resulted in significant improvements in many natural language processing (NLP) applications. However, due to differences in word distribution and domain data distribution, applying NLP advancements to log analysis directly faces some performance challenges. This paper studies how to adapt the recently introduced pre-trained language model BERT for log analysis. In this work, we propose a pre-trained log representation model with hierarchical bidirectional encoder transformers (namely, HilBERT). Unlike previous work, which used raw text as pre-training data, we parse logs into templates before using the log templates to pre-train HilBERT.We also design a hierarchical transformers model to capture log template sequence-level information. We use log-based anomaly detection for downstream tasks and fine-tune our model with different log data. Our experiments demonstrate that HilBERT outperforms other baseline techniques on unstable log data. While BERT obtains performance comparable to that of previous state-of-the-art models, HilBERT can significantly address the problem of log instability and achieve accurate and robust results.

engineering, electrical & electronic,computer science, hardware & architecture

What problem does this paper attempt to address?

The paper attempts to address the challenges encountered in log-based anomaly detection in system logs, particularly the difficulties in handling unknown log events and log instability. Specifically, existing methods perform poorly when dealing with unknown log events or sequences, and as systems become increasingly complex, manually inspecting logs to detect anomalies becomes more and more difficult. To address these issues, the paper proposes a new model named HilBERT. HilBERT is a log representation model specifically designed for system logs, combining a log encoder and a sequence encoder, utilizing a hierarchical transformer architecture to capture information at the log template level. Unlike previous methods, HilBERT parses the raw logs into templates during the pre-training phase and uses log templates for pre-training, designing a hierarchical transformer model to capture information at the log template sequence level. In this way, HilBERT significantly improves the accuracy and robustness of log-based anomaly detection on unstable datasets, especially in cases containing a large number of unseen log events or sequences. Experimental results show that HilBERT outperforms other baseline methods, including previous state-of-the-art models like BERT, on unstable log data.

Improving Log-Based Anomaly Detection by Pre-Training Hierarchical Transformers

Log Anomaly Detection method based on BERT model optimization

LogBERT: Log Anomaly Detection via BERT

Leveraging Large Language Models and BERT for Log Parsing and Anomaly Detection

BertHTLG: Graph-Based Microservice Anomaly Detection Through Sentence-Bert Enhancement.

Natural Language Processing-based Model for Log Anomaly Detection

HitAnomaly: Hierarchical Transformers for Anomaly Detection in System Log

BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model

LLMeLog: an Approach for Anomaly Detection Based on LLM-enriched Log Events

LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection

LogBD: A Log Anomaly Detection Method Based on Pretrained Models and Domain Adaptation

Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models

LAnoBERT: System Log Anomaly Detection based on BERT Masked Language Model

Research on Log Anomaly Detection Based on Sentence-BERT

An Anomaly Detection Approach of Part-of-Speech Log Sequence Via Population Based Training

What Information Contributes to Log-based Anomaly Detection? Insights from a Configurable Transformer-Based Approach

LogFiT: Log Anomaly Detection Using Fine-Tuned Language Models

CLDTLog: System Log Anomaly Detection Method Based on Contrastive Learning and Dual Objective Tasks

Log-based Anomaly Detection Without Log Parsing

Hierarchical Transformers for Long Document Classification

SemLog: A Semantics-based Approach for Anomaly Detection in Big Data System Logs