LogFiT: Log Anomaly Detection Using Fine-Tuned Language Models

Crispin Almodovar,S. Azad,Fariza Sabrina,Sarvnaz Karimi
DOI: https://doi.org/10.1109/TNSM.2024.3358730
2024-04-01
IEEE Transactions on Network and Service Management
Abstract:System logs are a valuable source of information for monitoring and maintaining the security and stability of computer systems. Techniques based on Deep Learning and Natural Language Processing have demonstrated effectiveness in detecting abnormal behaviour from these system logs. However, existing anomaly detection approaches have limitations in terms of flexibility and practicality. Techniques that rely on log templates such as DeepLog and LogBERT fail to capture semantic information and are unable to handle variability in log content. On the other hand, classification-based approaches such as LogSy, LogRobust and HitAnomaly require time-consuming data labelling for supervised training. In this paper, a novel log anomaly detection model named LogFiT is proposed. The LogFiT model doesn’t make use of a vocabulary of log templates and it doesn’t require any labeled data as the model only requires self-supervised training. The LogFiT model uses a pretrained Bidirectional Encoder Representations from Transformers (BERT)-based language model fine-tuned to recognise the linguistic patterns of the normal log data. The LogFiT model is trained using masked sentence prediction on the normal log data only. Consequently, when presented with the new log data, the model’s top- ${k}$ token prediction accuracy serves as a threshold for determining whether the new log data deviates from the normal log data. Experimental results show that LogFiT’s F1 score exceeds that of baselines on the HDFS, BGL, and Thunderbird datasets. Critically, when variability is introduced in the log data during evaluation, LogFiT retains its effectiveness compared to that of baselines.
Computer Science
What problem does this paper attempt to address?