Abstract:Log parsing is a critical step that transforms unstructured log data into structured formats, facilitating subsequent log-based analysis. Traditional syntax-based log parsers are efficient and effective, but they often experience decreased accuracy when processing logs that deviate from the predefined rules. Recently, large language models (LLM) based log parsers have shown superior parsing accuracy. However, existing LLM-based parsers face three main challenges: 1)time-consuming and labor-intensive manual labeling for fine-tuning or in-context learning, 2)increased parsing costs due to the vast volume of log data and limited context size of LLMs, and 3)privacy risks from using commercial models like ChatGPT with sensitive log information. To overcome these limitations, this paper introduces OpenLogParser, an unsupervised log parsing approach that leverages open-source LLMs (i.e., Llama3-8B) to enhance privacy and reduce operational costs while achieving state-of-the-art parsing accuracy. OpenLogParser first groups logs with similar static text but varying dynamic variables using a fixed-depth grouping tree. It then parses logs within these groups using three components: i)similarity scoring-based retrieval augmented generation: selects diverse logs within each group based on Jaccard similarity, helping the LLM distinguish between static text and dynamic variables; ii)self-reflection: iteratively query LLMs to refine log templates to improve parsing accuracy; and iii) log template memory: stores parsed templates to reduce LLM queries for improved parsing efficiency. Our evaluation on LogHub-2.0 shows that OpenLogParser achieves 25% higher parsing accuracy and processes logs 2.7 times faster compared to state-of-the-art LLM-based parsers. In short, OpenLogParser addresses privacy and cost concerns of using commercial LLMs while achieving state-of-the-arts parsing efficiency and accuracy.

LogGenius: an Unsupervised Log Parsing Framework with Zero-shot Prompt Engineering

LLM-powered Zero-shot Online Log Parsing

Self-Supervised Log Parsing

LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models

LogPrompt: Prompt Engineering Towards Zero-Shot and Interpretable Log Analysis

Towards robust log parsing using self-supervised learning for system security analysis

LibreLog: Accurate and Efficient Unsupervised Log Parsing Using Open-Source Large Language Models

High-precision Online Log Parsing with Large Language Models

LUNAR: Unsupervised LLM-based Log Parsing

Prompting for Automatic Log Template Extraction

IPLog: An Efficient Log Parsing Method Based on Few-Shot Learning

DLLog: An Online Log Parsing Approach for Large-Scale System

Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies

Log Parsing with Self-Generated In-Context Learning and Self-Correction

Log Parsing with Prompt-based Few-shot Learning

Lemur: Log Parsing with Entropy Sampling and Chain-of-Thought Merging

Log Parsing with Generalization Ability under New Log Types

Self-Evolutionary Group-wise Log Parsing Based on Large Language Model

Self-supervised log parsing using semantic contribution difference

A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We?

HELP: Hierarchical Embeddings-based Log Parsing