A Robust Log Classification Approach Based on Natural Language Processing

Dongjiang Li,Jing Zhang,Jicheng Yang,Xianbo Zhang,Feng Lin,Chao Wang,Liang Chang
DOI: https://doi.org/10.1109/icccr56747.2023.10194171
2023-01-01
Abstract:The log data that records the operating state of a computer system is of great significance for understanding the system state. Log classification is crucial for engineers to monitor the system running status and analysis of system failures. To improve the representation quality of the log template and reduce classification model inference time, we propose a new log classification method based on natural language processing techniques. In this paper, three embedding methods are adopted to complete the word vectorization process and improve the digital representation quality of log templates, which can make full use of semantic information, part-of-speech (PoS) information, and location information of words in log templates. This word vectorization process provides the log classification model with more informative inputs and promotes the model to make better results. The classification model consists of TextCNN and a nonlinear classifier. We utilize the knowledge distillation method and transfer the knowledge from BERT to TextCNN to improve the accuracy and efficiency of the proposed classification model. The effectiveness of our approach is tested on five public datasets and one private dataset collected from a global top e-commerce corporation. The experimental results show that, compared with other state-of-the-art log classification methods, the proposed method performs well and achieves a better classification result.
What problem does this paper attempt to address?