Label-Efficient Interactive Time-Series Anomaly Detection

Hong Guo,Yujing Wang,Jieyu Zhang,Zhengjie Lin,Yunhai Tong,Lei Yang,Luoxing Xiong,Congrui Huang
DOI: https://doi.org/10.48550/arXiv.2212.14621
2022-12-30
Abstract:Time-series anomaly detection is an important task and has been widely applied in the industry. Since manual data annotation is expensive and inefficient, most applications adopt unsupervised anomaly detection methods, but the results are usually sub-optimal and unsatisfactory to end customers. Weak supervision is a promising paradigm for obtaining considerable labels in a low-cost way, which enables the customers to label data by writing heuristic rules rather than annotating each instance individually. However, in the time-series domain, it is hard for people to write reasonable labeling functions as the time-series data is numerically continuous and difficult to be understood. In this paper, we propose a Label-Efficient Interactive Time-Series Anomaly Detection (LEIAD) system, which enables a user to improve the results of unsupervised anomaly detection by performing only a small amount of interactions with the system. To achieve this goal, the system integrates weak supervision and active learning collaboratively while generating labeling functions automatically using only a few labeled data. All of these techniques are complementary and can promote each other in a reinforced manner. We conduct experiments on three time-series anomaly detection datasets, demonstrating that the proposed system is superior to existing solutions in both weak supervision and active learning areas. Also, the system has been tested in a real scenario in industry to show its practicality.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the detection performance in time - series anomaly detection when the label data is limited. Specifically, the paper focuses on: 1. **Insufficient label data**: Manual data annotation is costly and inefficient, resulting in most applications adopting unsupervised anomaly detection methods. However, the results of these methods are usually not satisfactory and cannot meet the needs of end - customers. 2. **Application of weak supervision**: Weak supervision is a promising method to obtain a large number of labels at a lower cost, allowing users to label data by writing heuristic rules instead of annotating each instance one by one. However, in the field of time - series, due to the numerical continuity and incomprehensibility of data, it is difficult for people to write reasonable annotation functions. 3. **Interactive improvement**: The paper proposes a label - efficient interactive time - series anomaly detection system (LEIAD), which enables users to improve the results of unsupervised anomaly detection through a small amount of interaction with the system. ### Main contributions of the paper 1. **System design**: The LEIAD system combines weak supervision and active learning, automatically generates annotation functions, and uses a small amount of annotated data for training. These techniques complement each other and can promote each other's effects in an enhanced way. 2. **Combination of weak supervision and active learning**: The system not only integrates active learning and weak supervision, but also designs a hybrid query strategy that can effectively select the most informative samples for annotation at different stages. 3. **Annotation function generation**: Two methods for generating annotation functions are proposed: one is the initial annotation function based on the unsupervised anomaly detector; the other is to generate new annotation functions by searching for similar timestamp data points according to the labels annotated by users during the interaction process. ### Method overview 1. **Unsupervised Anomaly Detector (UAD)**: The system adopts five common unsupervised anomaly detectors (i - Forest, SR, STL, RC - Forest, Luminol) as the initial annotation functions. 2. **Weak supervision module**: Use weak supervision models such as Snorkel to integrate the voting results of multiple annotation functions to generate weak labels. 3. **End - model**: Train a final classification model (such as LightGBM) based on the generated weak labels for anomaly detection. 4. **Active learning**: Design a hybrid query strategy that combines uncertainty, diversity and anomaly probability to select the most informative samples for annotation. 5. **Annotation function generation**: Use methods such as TS2Vec to generate dense representation vectors of time - series, and generate new annotation functions by querying similar points and expanding the same labels. ### Experimental verification The paper conducted experiments on three time - series anomaly detection datasets (Yahoo, KPI, Microsoft) to verify the effectiveness of the LEIAD system. The experimental results show that the LEIAD system is superior to existing methods in both weak supervision and active learning, and demonstrates its practicality in actual industrial scenarios.