Self-Evolutionary Group-wise Log Parsing Based on Large Language Model

Changhua Pei,Zihan Liu,Jianhui Li,Erhan Zhang,Le Zhang,Haiming Zhang,Wei Chen,Dan Pei,Gaogang Xie
DOI: https://doi.org/10.1109/issre62328.2024.00016
2024-01-01
Abstract:Log parsing involves extracting appropriate templates from semi-structured logs, providing foundational information for downstream log analysis tasks such as anomaly detection and log comprehension. Initially, the task of log parsing was approached by domain experts who manually designed heuristic rules to extract templates. However, the effectiveness of these manual rules deteriorates when certain characteristics of a new log dataset do not conform to the pre-designed rules. To address these issues, introducing large language models (LLM) into log parsing has yielded promising results. Nevertheless, there are two limitations: one is the reliance on manually annotated templates within the prompt, and the other is the low efficiency of log processing. To address these challenges, we propose a self-evolving method called SelfLog, which, on the one hand, uses similar pairs extracted by LLM itself in the historical data to act as the prompt of a new log, allowing the model to learn in a self-evolution and labeling-free way. On the other hand, we propose an N-Gram-based grouper and log hitter. This approach not only improves the parsing performance of LLM by extracting the templates in a group-wise way instead of a log-wise way but also significantly reduces the unnecessary calling to LLMs for those logs whose group template is already extracted in history. We evaluate the performance and efficiency of SelfLog on 16 public datasets, involving tens of millions of logs, and the experiments demonstrate that SelfLog has achieved state-of-the-art (SOTA) levels in 0.975’s GA, and 0.942’s PA. More importantly, without sacrificing accuracy, the processing speed has reached a remarkable 45,000 logs per second.
What problem does this paper attempt to address?