LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models

Aoxiao Zhong,Dengyao Mo,Guiyang Liu,Jinbu Liu,Qingda Lu,Qi Zhou,Jiesheng Wu,Quanzheng Li,Qingsong Wen
2024-08-25
Abstract:Logs are ubiquitous digital footprints, playing an indispensable role in system diagnostics, security analysis, and performance optimization. The extraction of actionable insights from logs is critically dependent on the log parsing process, which converts raw logs into structured formats for downstream analysis. Yet, the complexities of contemporary systems and the dynamic nature of logs pose significant challenges to existing automatic parsing techniques. The emergence of Large Language Models (LLM) offers new horizons. With their expansive knowledge and contextual prowess, LLMs have been transformative across diverse applications. Building on this, we introduce LogParser-LLM, a novel log parser integrated with LLM capabilities. This union seamlessly blends semantic insights with statistical nuances, obviating the need for hyper-parameter tuning and labeled training data, while ensuring rapid adaptability through online parsing. Further deepening our exploration, we address the intricate challenge of parsing granularity, proposing a new metric and integrating human interactions to allow users to calibrate granularity to their specific needs. Our method's efficacy is empirically demonstrated through evaluations on the Loghub-2k and the large-scale LogPub benchmark. In evaluations on the LogPub benchmark, involving an average of 3.6 million logs per dataset across 14 datasets, our LogParser-LLM requires only 272.5 LLM invocations on average, achieving a 90.6% F1 score for grouping accuracy and an 81.1% for parsing accuracy. These results demonstrate the method's high efficiency and accuracy, outperforming current state-of-the-art log parsers, including pattern-based, neural network-based, and existing LLM-enhanced approaches.
Software Engineering,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the issues of efficiency and accuracy in the log parsing process. Specifically: - **Improving Log Parsing Efficiency and Accuracy**: The paper introduces a new method called LogParser-LLM, which leverages the capabilities of large language models (LLM) to enhance the efficiency and accuracy of log parsing. By integrating semantic understanding and statistical features, this method achieves rapid adaptability without the need for hyperparameter tuning and labeled training data. - **Reducing LLM Invocation Frequency**: By incorporating a prefix tree structure combined with an LLM template extractor, this method significantly reduces the number of LLM invocations, thereby lowering computational overhead. - **Addressing Parsing Granularity Issues**: The research also explores the granularity issues in log parsing and proposes a new metric called "Granularity Distance" to evaluate the differences between various parsing results. Additionally, by integrating user feedback, it allows users to adjust the parsing granularity according to specific needs. - **Validating Method Effectiveness**: The paper demonstrates through evaluations on the Loghub-2k and large-scale LogPub benchmark datasets that LogParser-LLM improves group accuracy and parsing accuracy by 48.3% and 32.0%, respectively, compared to the current state-of-the-art log parsers. Furthermore, after calibration with a small amount of labeled data, the performance further increases to 56.8% and 69.7%. In summary, this paper aims to improve traditional log parsing methods by introducing LLM technology to achieve a more efficient and accurate log parsing process, and addresses granularity issues by introducing new evaluation metrics and user interaction mechanisms.