Multi-Parameter Log Anomaly Detection with an Unsupervised Learning Approach

Hironori Uchida,Keitaro Tominaga,Hideki Itai,Yujie Li,Yoshihisa Nakatoh
DOI: https://doi.org/10.1109/pcds61776.2024.10743635
2024-01-01
Abstract:Automatic anomaly detection using system logs faces three main challenges: (1) lack of diversity in research datasets (requiring expert knowledge for labeling), (2) anomaly patterns only exist as sequence anomalies, and (3) difficulty in unsupervised learning. To address these issues, we have created our own dataset for parameter anomaly detection. This study focuses on detecting pattern anomalies involving multiple parameters (string type, integer type, string type). For example, a normal pattern is define d as “State=B, Value=80~2500, InfoD,” and any Value outside this range is considered an anomaly when State=B and InfoD. To solve this, we propose a new method for parameter anomaly detection using unsupervised learning with BertForMaskedLM. BertFor MaskedLM predicts masked portions (tokens) within the input string and outputs probabilities. By masking the parameter portion s we want to detect anomalies in, we can perform anomaly detection for each parameter. Training the model with only normal patterns results in significantly lower prediction probabilities for anomalous patterns, allowing anomaly detection using a thresholding method. Using this method, we measured the F1 scores and achieved high accuracy: 0.93 for patterns focusing on State, 0.89 for patterns focusing on Value, and 1.0 for patterns focusing on Info.
What problem does this paper attempt to address?