LogGenius: an Unsupervised Log Parsing Framework with Zero-shot Prompt Engineering

Xian Yu,Shengxi Nong,Dongbiao He,Weijie Zheng,Teng Ma,Ning Liu,Jianhui Li,Gaogang Xie
DOI: https://doi.org/10.1109/icws62655.2024.00159
2024-01-01
Abstract:Efficient and accurate parsing of unstructured logs is crucial for anomaly detection, root cause localization, and log compression. Although many existing works have made good progress relying on Large Language Models (LLMs) and prompt engineering techniques, most of them require a certain degree of labeling or few-shot prompts, which limits their applicability in large-scale real-time heterogeneous log environments. To tackle this issue, we develop LogGenius, a novel unsupervised log parsing framework. It initially enriches the diversity of the parsed logs by leveraging generative LLMs with zero-shot prompts. It then employs an unsupervised parsing model on the augmented log data to accomplish log parsing. In order to alleviate the impact of potential hallucination issues caused by generative LLMs, we conduct a meticulous analysis and summarize the biases inherent in LLMs when directly applying them to generate diversified logs. Building upon these insights, we propose an effective log diversity augmentation algorithm to mitigate the aforementioned concerns.We thoroughly evaluate LogGenius based on various open-source system runtime log datasets and a new alarm log dataset from a commercial cloud production environment. The experimental results demonstrate that LogGenius can improve the parsing accuracy by up to about 30%, and the parsing accuracy in unseen logs by up to about 100%, compared to the state-of-the-art unsupervised-based methods.
What problem does this paper attempt to address?