A Confidence-Guided Anomaly Detection Approach Jointly Using Multiple Machine Learning Algorithms

Xueshuo Xie,Zongming Jin,Qingqi Han,Shenwei Huang,Tao Li
DOI: https://doi.org/10.1007/978-3-030-37352-8_8
2019-01-01
Abstract:Log data contains very rich and valuable information that records system states and behavior, which can be used to diagnose system failures. Anomaly detection from large-scale log data plays a key role in building secure and trustworthy systems. Anomaly detection model based on machine learning has achieved good results in practical applications. However, logs generated by modern large-scale distributed systems are more complex than ever before in terms of data size and variety. Therefore, the traditional single-machine learning anomaly detection model faces the model aging problem. We design an anomaly detection model that combines multiple machine learning algorithms. By using a conformal prediction, we can calculate the confidence of each algorithm for each log to be detected and use statistical analysis to tag them with a trusted label. The approach was tested on the public HDFS 100k log dataset, and the results show that our model is more accurate.
What problem does this paper attempt to address?