Large Language Model Guided Knowledge Distillation for Time Series Anomaly Detection

Chen Liu,Shibo He,Qihang Zhou,Shizhong Li,Wenchao Meng
2024-01-26
Abstract:Self-supervised methods have gained prominence in time series anomaly detection due to the scarcity of available annotations. Nevertheless, they typically demand extensive training data to acquire a generalizable representation map, which conflicts with scenarios of a few available samples, thereby limiting their performance. To overcome the limitation, we propose \textbf{AnomalyLLM}, a knowledge distillation-based time series anomaly detection approach where the student network is trained to mimic the features of the large language model (LLM)-based teacher network that is pretrained on large-scale datasets. During the testing phase, anomalies are detected when the discrepancy between the features of the teacher and student networks is large. To circumvent the student network from learning the teacher network's feature of anomalous samples, we devise two key strategies. 1) Prototypical signals are incorporated into the student network to consolidate the normal feature extraction. 2) We use synthetic anomalies to enlarge the representation gap between the two networks. AnomalyLLM demonstrates state-of-the-art performance on 15 datasets, improving accuracy by at least 14.5\% in the UCR dataset.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the issue in time series anomaly detection where self-supervised methods, despite gaining attention due to the scarcity of labeled data, typically require a large amount of training data to obtain generalized representations. This limits their performance in scenarios with limited sample sizes. Specifically, the paper proposes a knowledge distillation-based time series anomaly detection method called AnomalyLLM, which aims to achieve efficient time series anomaly detection with a small number of samples by leveraging a large-scale pre-trained language model (LLM) as a teacher network to guide the student network in learning the features of normal samples. The main contributions of the paper include: 1. Proposing the first knowledge distillation-based time series anomaly detection method. 2. Designing a pre-trained LLM teacher network adapted for time series, capable of learning rich and generalized representations through fine-tuning. 3. Introducing prototype signals and data augmentation-based training strategies to maintain the differences between the teacher and student networks. 4. Conducting extensive experiments on 15 real-world datasets to demonstrate the superior performance of the proposed model.