Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies

Yilun Liu,Shimin Tao,Weibin Meng,Jingyu Wang,Wenbing Ma,Yanqing Zhao,Yuhang Chen,Hao Yang,Yanfei Jiang,Xun Chen
DOI: https://doi.org/10.48550/arXiv.2308.07610
2024-01-26
Abstract:Automated log analysis is crucial in modern software-intensive systems for facilitating program comprehension throughout software maintenance and engineering life cycles. Existing methods perform tasks such as log parsing and log anomaly detection by providing a single prediction value without interpretation. However, given the increasing volume of system events, the limited interpretability of analysis results hinders analysts' comprehension of program status and their ability to take appropriate actions. Moreover, these methods require substantial in-domain training data, and their performance declines sharply (by up to 62.5%) in online scenarios involving unseen logs from new domains, a common occurrence due to rapid software updates. In this paper, we propose LogPrompt, a novel interpretable log analysis approach for online scenarios. LogPrompt employs large language models (LLMs) to perform online log analysis tasks via a suite of advanced prompt strategies tailored for log tasks, which enhances LLMs' performance by up to 380.7% compared with simple prompts. Experiments on nine publicly available evaluation datasets across two tasks demonstrate that LogPrompt, despite requiring no in-domain training, outperforms existing approaches trained on thousands of logs by up to 55.9%. We also conduct a human evaluation of LogPrompt's interpretability, with six practitioners possessing over 10 years of experience, who highly rated the generated content in terms of usefulness and readability (averagely 4.42/5). LogPrompt also exhibits remarkable compatibility with open-source and smaller-scale LLMs, making it flexible for practical deployment. Code of LogPrompt is available at <a class="link-external link-https" href="https://github.com/lunyiliu/LogPrompt" rel="external noopener nofollow">this https URL</a>.
Software Engineering,Computation and Language
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address two main challenges in automated log analysis: 1. **Lack of Adaptability in Online Scenarios**: - Existing log analysis methods show significant performance degradation when dealing with unseen logs from new domains. In practical applications, software systems are frequently upgraded, introducing new features, fixing bugs, or enhancing performance, which can lead to the generation of new log types and potentially render old log versions incompatible. Therefore, the lack of sufficient historical log data often hinders effective model training. - In extreme cases, when a completely new service is launched, the absence of domain-specific log data makes it impossible for existing methods to train effectively. 2. **Limited Interpretability of Results**: - Current methods typically provide only a single prediction value without further explanation. This lack of interpretability makes it difficult for analysts to understand the program's state and take appropriate actions. - Interpretable analysis outputs not only aid in understanding the program but also help in detecting false positives, tracing root causes, and taking appropriate measures. ### Solution To address these challenges, the authors propose **LogPrompt**, a novel interpretable online log analysis method based on large language models (LLMs). LogPrompt enhances the performance of LLMs through the following advanced prompting strategies: 1. **Self-prompt**: - Utilize the LLM's own capabilities to generate prompt candidates suitable for log analysis tasks. - Evaluate the performance of these prompt candidates on a small-scale task-specific dataset to select the best prompt. 2. **Chain-of-Thought Prompt (CoT)**: - Simulate the multi-step reasoning process humans use when solving complex problems to enhance LLMs' performance on challenging tasks. - Explicitly and implicitly define intermediate steps to ensure the generated answers are more reasonable and logical. 3. **In-context Prompt**: - Use a small number of labeled log examples to create task context, enabling LLMs to efficiently adapt online without iterative training processes. - By conditioning on input-label pairs (demonstrations), LLMs can make predictions on new inputs. ### Experimental Results Experimental results show that LogPrompt can achieve comparable or even better performance than existing methods on multiple public datasets without requiring domain-specific training. Specifically: - **Log Parsing Task**: LogPrompt's average F1 score across eight datasets is 55.9% higher than existing methods. - **Anomaly Detection Task**: The explanations generated by LogPrompt were highly rated by six senior practitioners with over 10 years of experience, with an average score of 4.42/5. ### Conclusion LogPrompt successfully addresses the adaptability and interpretability issues in online log analysis by leveraging large language models and advanced prompting strategies, providing a flexible and efficient new method for practical applications.