Large language models and unsupervised feature learning: implications for log analysis

Egil Karlsen,Xiao Luo,Nur Zincir-Heywood,Malcolm Heywood
DOI: https://doi.org/10.1007/s12243-024-01028-2
2024-04-05
Annals of Telecommunications
Abstract:Log file analysis is increasingly being addressed through the use of large language models (LLM). LLM provides the mechanism for discovering embeddings for distinguishing between different behaviors present in log files. In this work, we are interested in discriminating between normal and anomalous behaviors via an unsupervised learning approach. To this end, firstly five recent LLM architectures are evaluated over six different log files. Then, further research is conducted to explicitly quantify the significance of performing self-supervised fine-tuning on the LLMs. Moreover, we show that the quality of an (unsupervised) feature map used to make the overall (normal/anomalous) predictions may also benefit from an AutoEncoder stage between LLM and feature map. Such an AutoEncoder provides significant reductions in the cost of training the feature map and typically improves the quality of the resulting predictions.
telecommunications
What problem does this paper attempt to address?