Revisiting Log Parsing: The Present, the Future, and the Uncertainties

Zhijing Li,Qiuai Fu,Zhijun Huang,Jianbo Yu,Yiqian Li,Yuanhao Lai,Yuchi Ma
DOI: https://doi.org/10.1109/tr.2023.3340020
IF: 5.883
2024-01-01
IEEE Transactions on Reliability
Abstract:In the recent decade, the amount of software runtime logs has increased rapidly and spawned a line of automated log analysis research using machine learning or data mining algorithms. In the typical workflow of log analysis, log parsing, which aims to transform unstructured or semistructured logs into structured logs, is crucial to various downstream algorithms. While the state-of-the-art (SOTA) parsers achieve extremely high accuracy, recent research shows that these parsers are far from being useful under stricter evaluation metrics. Thus, researchers and practitioners are unclear about the current state of log parsing research and what might be important to explore in the future. To this end, we conduct an empirical study to revisit log parsing by running extensive experiments of SOTA parsers on 16 widely used log datasets under five evaluation metrics with different preprocessing settings. Our results show that the performance of log parsers varies significantly under different evaluation metrics. In addition, preprocessing plays an important role in the evaluation. In particular, preprocessing with common regular expressions can cause a 0.38 performance difference in group accuracy, highlighting the importance of reporting preprocessing details in parsing research. We also generalize the word-level regular expressions in preprocessing and try to use them to parse the whole logs, which leads to surprisingly decent accuracy. These results imply that formulating log parsing as a word-level classification task is a feasible future direction. Moreover, we find out that the most widely used dataset (i.e., LogHub) contains labeling errors. To address this issue, we make an extensive manual effort to fix the errors in the log dataset, providing a revised ground truth for future log parsing research. On the revised log dataset, our simple parser (word-level regular expression-based) achieves 0.97 precision-template accuracy on the Spark dataset and an average recall-template accuracy of 0.93 on 16 datasets, which outperforms all existing parsers.
engineering, electrical & electronic,computer science, software engineering, hardware & architecture
What problem does this paper attempt to address?