Improving Log Anomaly Detection Via Spatial Pooling: Combining Spclassifier with Ensemble Method

Hironori Uchida,Keitaro Tominaga,Hideki Itai,Yujie Li,Yoshihisa Nakatoh
DOI: https://doi.org/10.2139/ssrn.4719837
2024-01-01
Cognitive Robotics
Abstract:As modern software systems become increasingly complex, the amount of data that engineers need to analyze is growing in volume and difficulty. For this reason, many studies using deep learning have been reported in the field of log anomaly detection. However, no significant results have been obtained in the software development field, and several issues remain. This study aims to solve the following two issues for practical use in the software development field. The first is to ensure that accuracy does not degrade under various dataset conditions, and the second is to create an AI model that can be trained ad hoc for software that is updated daily. In this experiment, we aim to improve accuracy by applying three ensemble methods to SPClassifier, an AI model that enables anomaly detection in resource-constrained environments (and can be learned ad hoc). The three Ensemble Methods used were Pasting and Bagging, plus Improved Bagging, a proposed method that combines features of both. Ten cross-validations were conducted to compare the results with CNN, and to compare the improvement results for each ensemble method. The results showed a significant decrease in variance values and improved accuracy for all ensemble methods. Compared to the basic method (SPClassifier), which does not apply any ensemble methods, all ensemble methods showed equal or better improvement in F1 scores. In particular, the improved bagging method demonstrated the most significant improvement in accuracy, with an improvement of 0.218. In addition, it achieved results that were nearly identical to representative CNN models and even outperformed them by 0.167 in F1 scores, depending on the type of dataset. Pasting and Bagging showed distinct characteristics in the results, with Pasting improving Precision and Bagging improving Recall. Improved Bagging, with both features, proved to be the most stable method across all evaluation metrics.
What problem does this paper attempt to address?