LogClass: Anomalous Log Identification and Classification with Partial Labels

Weibin Meng,Ying Liu,Shenglin Zhang,Federico Zaiter,Yuzhe Zhang,Yuheng Huang,Zhaoyang Yu,Yuzhi Zhang,Lei Song,Ming Zhang,Dan Pei
DOI: https://doi.org/10.1109/tnsm.2021.3055425
2021-01-01
IEEE Transactions on Network and Service Management
Abstract:Logs are imperative in the management process of networks and services. However, manually identifying and classifying anomalous logs is time-consuming, error-prone, and labor-intensive. Additionally, rule-based approaches cannot tackle the challenges underlying anomalous log identification and classification resulting from new types of logs and partial labels. We propose LogClass, a framework to automatically and robustly identify and classify anomalous logs for network and service based on partial labels. LogClass combines a word representation method, a positive and unlabeled learning (PU learning) model, and a machine learning classifier. Besides, we propose a novel Inverse Location Frequency (ILF) method to weight the words of logs in feature construction properly. We evaluate the performance of LogClass based on 18 million+ real-world switch logs and six public log datasets. It achieves 99.56% and 98% F1 scores in anomalous log identification on switch logs and publicly available supercomputer logs, respectively, and very-close-to-one F1 score in anomalous log classification. Moreover, we have conducted extensive experiments to demonstrate LogClass’ superior performance in addressing partial labels and new types of logs.
What problem does this paper attempt to address?