COFT-AD: COntrastive Fine-Tuning for Few-Shot Anomaly Detection

Jingyi Liao,Xun Xu,Manh Cuong Nguyen,Adam Goodge,Chuan Sheng Foo
2024-02-29
Abstract:Existing approaches towards anomaly detection~(AD) often rely on a substantial amount of anomaly-free data to train representation and density models. However, large anomaly-free datasets may not always be available before the inference stage; in which case an anomaly detection model must be trained with only a handful of normal samples, a.k.a. few-shot anomaly detection (FSAD). In this paper, we propose a novel methodology to address the challenge of FSAD which incorporates two important techniques. Firstly, we employ a model pre-trained on a large source dataset to initialize model weights. Secondly, to ameliorate the covariate shift between source and target domains, we adopt contrastive training to fine-tune on the few-shot target domain data. To learn suitable representations for the downstream AD task, we additionally incorporate cross-instance positive pairs to encourage a tight cluster of the normal samples, and negative pairs for better separation between normal and synthesized negative samples. We evaluate few-shot anomaly detection on on 3 controlled AD tasks and 4 real-world AD tasks to demonstrate the effectiveness of the proposed method.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper aims to solve the problem of anomaly detection when only a small number of normal samples are available. Specifically, existing anomaly detection methods usually rely on a large amount of anomaly - free data to train representation and density models. However, in practical applications, these large amounts of anomaly - free data may not be available before the inference stage. Therefore, a method that can train an anomaly detection model with only a few normal samples is required, namely Few - Shot Anomaly Detection (FSAD). ### Main contributions of the paper: 1. **Propose a new few - shot anomaly detection framework**: By combining transfer learning of pre - trained models and representation learning of a small amount of normal data in the target domain, the "COntrastive Fine - Tuning for Few - Shot Anomaly Detection (COFT - AD)" method is proposed. 2. **Introduce cross - instance positive pair loss**: To encourage normal samples to form tight clusters in the feature space, the cross - instance positive pair loss is further introduced, which is helpful for density - based anomaly detection. 3. **Integrate negative pair loss**: When prior knowledge about anomalies is available, the separation between normal samples and abnormal samples is further optimized by synthesizing negative samples. 4. **Extensive experimental verification**: Extensive experiments were carried out on 3 controlled anomaly detection tasks and 4 real - world industrial defect identification tasks, demonstrating the effectiveness and competitiveness of this method. ### Method overview: - **Contrastive fine - tuning**: Initialize the target - domain model with the weights of the pre - trained model, and perform fine - tuning on the target - domain data through contrastive learning to adapt to the distribution of the target - domain. - **Cross - instance positive pair loss**: Encourage normal samples to form tight clusters in the feature space by minimizing the cosine similarity between randomly selected normal sample pairs. - **Negative pair loss**: By synthesizing negative samples, minimize the cosine similarity between normal samples and their negative samples to enhance the separation between normal samples and abnormal samples. ### Formula summary: - **Contrastive loss**: \[ L_{\text{Con}} = -\frac{1}{N_T} \sum_{X_i \in D_T} \frac{q(g(z_i))^T g(\hat{z}_i)}{\|q(g(z_i))\| \cdot \|g(\hat{z}_i)\|} \] - **Cross - instance positive pair loss**: \[ L_{\text{PP}} = -\frac{1}{2N_T} \sum_{i} \sum_{j \in p} \left( \frac{f(t(X_i); \Theta)^T f(t(X_j); \hat{\Theta})}{\|f(t(X_i); \Theta)\| \cdot \|f(t(X_j); \hat{\Theta})\|} + \frac{f(t(X_j); \Theta)^T f(t(X_i); \hat{\Theta})}{\|f(t(X_j); \Theta)\| \cdot \|f(t(X_i); \hat{\Theta})\|} \right) \] - **Negative pair loss**: \[ L_{\text{NP}} = \frac{1}{N_T} \sum_{i} \frac{f(X_i; \hat{\Theta})^T f(t_n(X_i); \Theta)}{\|f(X_i; \hat{\Theta})\| \cdot \|f(t_n(X_i); \Theta)\|} \] - **Total loss**: \[ L_{\text{all}} = L_{\text{Con}} + \lambda_{\text{PP}} L_{\text{PP}} + \lambda_{\text{