Leveraging RAG-Enhanced Large Language Model for Semi-Supervised Log Anomaly Detection

Wanhao Zhang,Qianli Zhang,Enyu Yu,Yuxiang Ren,Yeqing Meng,Mingxi Qiu,Jilong Wang
DOI: https://doi.org/10.1109/issre62328.2024.00026
2024-01-01
Abstract:Log-based anomaly detection is critical in monitoring the operations of information systems and in the real-time reporting of system failures. Utilizing deep learning-based log anomaly detection methods facilitates effective detection of anomalies within logs. However, existing methods are greatly dependent on log parsers, and parsing errors can considerably affect downstream anomaly detection tasks. Additionally, methods that predict the next log event in a sequence are susceptible to the instability of sequences and the emergence of unseen logs as systems evolve, resulting in a higher false positive rate. In this paper, we put forward LogRAG, a semi-supervised log anomaly detection framework based on retrieval-augmented generation (RAG). This framework conducts phased detection using both Log Tokens and Log Templates to mitigate the impact of log parsing errors. It also utilizes a single-class classifier to model the normal behavior of the system, thereby circumventing the effects of unstable sequences. Finally, it employs large language model (LLM) empowered by RAG to reevaluate detected anomalous logs, thereby improving accuracy. LogRAG demonstrates a 15% improvement in F1 Score on the BGL dataset and a 60% improvement on the Spirit dataset when compared to the previous best semi-supervised learning algorithm.
What problem does this paper attempt to address?