Abstract:Logs are widely used for system behavior diagnosis by automatic log mining. Log parsing is an important data preprocessing step that converts semi-structured log messages into structured data as the feature input for log mining. Currently, many studies are devoted to proposing new log parsers. However, to the best of our knowledge, no previous study comprehensively investigates the effectiveness of log parsers in industrial practice. To investigate the effectiveness of the log parsers in industrial practice, in this paper, we conduct an empirical study on the effectiveness of six state-of-the-art log parsers on 10 microservice applications of Ant Group. Our empirical results highlight two challenges for log parsing in practice: 1) various separators. There are various separators in a log message, and the separators in different event templates or different applications are also various. Current log parsers cannot perform well because they do not consider various separators. 2) Various lengths due to nested objects. The log messages belonging to the same event template may also have various lengths due to nested objects. The log messages of 6 out of 10 microservice applications at Ant Group with various lengths due to nested objects. 4 out of 6 state-of-the-art log parsers cannot deal with various lengths due to nested objects. In this paper, we propose an improved log parser named Drain+ based on a state-of-the-art log parser Drain. Drain+ includes two innovative components to address the above two challenges: a statistical-based separators generation component, which generates separators automatically for log message splitting, and a candidate event template merging component, which merges the candidate event templates by a template similarity method. We evaluate the effectiveness of Drain+ on 10 microservice applications of Ant Group and 16 public datasets. The results show that Drain+ outperforms the six state-of-the-art log parsers on industrial applications and public datasets. Finally, we conclude the observations in the road ahead for log parsing to inspire other researchers and practitioners.

Token Interdependency Parsing (Tipping) -- Fast and Accurate Log Parsing

Towards Automated Log Parsing for Large-Scale Log Data Analysis

ML-Parser: an Efficient and Accurate Online Log Parser

Logram: Efficient Log Parsing Using n-Gram Dictionaries

Tools and Benchmarks for Automated Log Parsing

Brain: Log Parsing with Bidirectional Parallel Tree

HELP: Hierarchical Embeddings-based Log Parsing

AS-Parser: Log Parsing Based on Adaptive Segmentation

IPLog: An Efficient Log Parsing Method Based on Few-Shot Learning

Preprocessing is All You Need: Boosting the Performance of Log Parsers With a General Preprocessing Framework

Cognition: Accurate and Consistent Linear Log Parsing Using Template Correction

LLM-powered Zero-shot Online Log Parsing

High-precision Online Log Parsing with Large Language Models

A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We?

LogPTR: Variable-Aware Log Parsing with Pointer Network

Investigating and Improving Log Parsing in Practice.

Log Parsing with Generalization Ability under New Log Types

Log Parsing Evaluation in the Era of Modern Software Systems

On Automatic Parsing of Log Records

A Directed Acyclic Graph Approach to Online Log Parsing