Efficient Log-based Anomaly Detection with Knowledge Distillation

Huy-Trung Nguyen,Lam-Vien Nguyen,Van-Hoang Le,Hongyu Zhang,Manh-Trung Le
DOI: https://doi.org/10.1109/icws62655.2024.00078
2024-01-01
Abstract:Logs are produced by many systems for troubleshooting purposes. Detecting abnormal events is crucial to maintaining regular operations and securing the security of systems. Despite the achievements of deep learning models on anomaly detection, it remains challenging to apply these deep learning models in some scenarios; one popular case is deploying on resource-constrained scenarios such as IoT devices due to the limitation of computational resources on these devices. We identify two main problems of adopting these deep learning models in practice, including (1) they cannot deploy on resource-constrained devices because of the size of large models and the time needed to analyze data with the models, and (2) they cannot achieve satisfactory detection accuracy with simple models. In this work, we proposed a novel lightweight anomaly detection method from system logs, DistilLog, to overcome these problems. DistilLog utilizes a pretrained word2vec model to represent log event templates as semantic vectors, incorporated with the PCA dimensionality reduction algorithm to minimize computational and storage burden. The Knowledge Distillation technique is applied to reduce the size of the detection model while maintaining high detection accuracy. The experimental results show that DistilLog can achieve high F-measures of 0.964 and 0.961 on HDFS and BGL datasets while maintaining the minimized model size and fastest detection speed. This effectiveness and efficiency demonstrate the potential for widespread use in most scenarios by showing the ability to deploy the proposed model on resource-constrained systems.
What problem does this paper attempt to address?