Abstract:Automated log analysis is crucial in modern software-intensive systems for facilitating program comprehension throughout software maintenance and engineering life cycles. Existing methods perform tasks such as log parsing and log anomaly detection by providing a single prediction value without interpretation. However, given the increasing volume of system events, the limited interpretability of analysis results hinders analysts' comprehension of program status and their ability to take appropriate actions. Moreover, these methods require substantial in-domain training data, and their performance declines sharply (by up to 62.5%) in online scenarios involving unseen logs from new domains, a common occurrence due to rapid software updates. In this paper, we propose LogPrompt, a novel interpretable log analysis approach for online scenarios. LogPrompt employs large language models (LLMs) to perform online log analysis tasks via a suite of advanced prompt strategies tailored for log tasks, which enhances LLMs' performance by up to 380.7% compared with simple prompts. Experiments on nine publicly available evaluation datasets across two tasks demonstrate that LogPrompt, despite requiring no in-domain training, outperforms existing approaches trained on thousands of logs by up to 55.9%. We also conduct a human evaluation of LogPrompt's interpretability, with six practitioners possessing over 10 years of experience, who highly rated the generated content in terms of usefulness and readability (averagely 4.42/5). LogPrompt also exhibits remarkable compatibility with open-source and smaller-scale LLMs, making it flexible for practical deployment. Code of LogPrompt is available at <a class="link-external link-https" href="https://github.com/lunyiliu/LogPrompt" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address two main challenges in automated log analysis: 1. **Lack of Adaptability in Online Scenarios**: - Existing log analysis methods show significant performance degradation when dealing with unseen logs from new domains. In practical applications, software systems are frequently upgraded, introducing new features, fixing bugs, or enhancing performance, which can lead to the generation of new log types and potentially render old log versions incompatible. Therefore, the lack of sufficient historical log data often hinders effective model training. - In extreme cases, when a completely new service is launched, the absence of domain-specific log data makes it impossible for existing methods to train effectively. 2. **Limited Interpretability of Results**: - Current methods typically provide only a single prediction value without further explanation. This lack of interpretability makes it difficult for analysts to understand the program's state and take appropriate actions. - Interpretable analysis outputs not only aid in understanding the program but also help in detecting false positives, tracing root causes, and taking appropriate measures. ### Solution To address these challenges, the authors propose **LogPrompt**, a novel interpretable online log analysis method based on large language models (LLMs). LogPrompt enhances the performance of LLMs through the following advanced prompting strategies: 1. **Self-prompt**: - Utilize the LLM's own capabilities to generate prompt candidates suitable for log analysis tasks. - Evaluate the performance of these prompt candidates on a small-scale task-specific dataset to select the best prompt. 2. **Chain-of-Thought Prompt (CoT)**: - Simulate the multi-step reasoning process humans use when solving complex problems to enhance LLMs' performance on challenging tasks. - Explicitly and implicitly define intermediate steps to ensure the generated answers are more reasonable and logical. 3. **In-context Prompt**: - Use a small number of labeled log examples to create task context, enabling LLMs to efficiently adapt online without iterative training processes. - By conditioning on input-label pairs (demonstrations), LLMs can make predictions on new inputs. ### Experimental Results Experimental results show that LogPrompt can achieve comparable or even better performance than existing methods on multiple public datasets without requiring domain-specific training. Specifically: - **Log Parsing Task**: LogPrompt's average F1 score across eight datasets is 55.9% higher than existing methods. - **Anomaly Detection Task**: The explanations generated by LogPrompt were highly rated by six senior practitioners with over 10 years of experience, with an average score of 4.42/5. ### Conclusion LogPrompt successfully addresses the adaptability and interpretability issues in online log analysis by leveraging large language models and advanced prompting strategies, providing a flexible and efficient new method for practical applications.

Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies

LogPrompt: Prompt Engineering Towards Zero-Shot and Interpretable Log Analysis

High-precision Online Log Parsing with Large Language Models

LogEval: A Comprehensive Benchmark Suite for Large Language Models In Log Analysis

LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models

LLM-powered Zero-shot Online Log Parsing

Adapting Large Language Models to Log Analysis with Interpretable Domain Knowledge

Studying and Benchmarking Large Language Models For Log Level Suggestion

Self-Evolutionary Group-wise Log Parsing Based on Large Language Model

PromptAid: Prompt Exploration, Perturbation, Testing and Iteration using Visual Analytics for Large Language Models

PromptExp: Multi-granularity Prompt Explanation of Large Language Models

LogGenius: an Unsupervised Log Parsing Framework with Zero-shot Prompt Engineering

Large Language Models are Good Multi-lingual Learners : When LLMs Meet Cross-lingual Prompts

LogLLM: Log-based Anomaly Detection Using Large Language Models

Prompting for Automatic Log Template Extraction

Are Large Language Models Good Prompt Optimizers?

Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis

PreLog: A Pre-trained Model for Log Analytics

Efficient Prompting Methods for Large Language Models: A Survey

LibreLog: Accurate and Efficient Unsupervised Log Parsing Using Open-Source Large Language Models

Towards Goal-oriented Prompt Engineering for Large Language Models: A Survey