Abstract:Industrial anomaly classification (AC) is an indispensable task in industrial manufacturing, which guarantees quality and safety of various product. To address the scarcity of data in industrial scenarios, lots of few-shot anomaly detection methods emerge recently. In this paper, we propose an effective few-shot anomaly classification (FSAC) framework with one-stage training, dubbed CLIP-FSAC++. Specifically, we introduce a cross-modality interaction module named Anomaly Descriptor following image and text encoders, which enhances the correlation of visual and text embeddings and adapts the representations of CLIP from pre-trained data to target data. In anomaly descriptor, image-to-text cross-attention module is used to obtain image-specific text embeddings and text-to-image cross-attention module is used to obtain text-specific visual embeddings. Then these modality-specific embeddings are used to enhance original representations of CLIP for better matching ability. Comprehensive experiment results are provided for evaluating our method in few-normal shot anomaly classification on VisA and MVTEC-AD for 1, 2, 4 and 8-shot settings. The source codes are at <a class="link-external link-https" href="https://github.com/Jay-zzcoder/clip-fsac-pp" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

This paper attempts to solve the data scarcity problem in the Anomaly Classification (AC) task during the industrial manufacturing process. Specifically, the author proposes an effective few - shot anomaly classification framework, CLIP - FSAC++, to deal with the scarcity of abnormal samples in industrial scenarios. The following are the specific problems that this paper attempts to solve: 1. **Data Scarcity Problem**: - In the industrial manufacturing environment, the appearance of abnormal samples is very rare, making it difficult to collect sufficient abnormal data for model training. - Meanwhile, the data labeling process is time - consuming and labor - intensive, so traditional supervised learning methods are difficult to apply in these scenarios. 2. **Limitations of Existing Methods**: - Unsupervised anomaly detection methods can be trained without abnormal samples, but their performance cannot meet all requirements. - Existing few - shot anomaly detection methods (Few - Shot Anomaly Detection, FSAD) can handle a small number of normal samples, but they still have deficiencies in practical applications, such as high computational cost and limited generalization ability. 3. **Cross - Modal Matching Problem**: - In the anomaly classification task, the matching between visual and text descriptions is crucial. However, finding accurate text prompts to describe normal and abnormal situations is a challenge, which will lead to the problem of visual - language mismatch. - The image distribution in industrial scenarios is quite different from the natural image distribution in the pre - training dataset, resulting in insufficient visual representation. To solve the above problems, the author proposes the CLIP - FSAC++ framework. By introducing lightweight image and text adapters and a cross - modal interaction module (Anomaly Descriptor), it enhances the matching and generalization abilities of CLIP in few - shot anomaly classification. Specific improvements include: - **Introducing Lightweight Adapters**: Adjust the prior representation of CLIP through image and text adapters to make it more suitable for the industrial field. - **Designing a Cross - Modal Interaction Module**: Through the cross - attention mechanism from image to text and from text to image, enhance the correlation between visual and text features, thereby improving classification performance. - **Simplifying the Training Strategy**: Adopt a joint training strategy instead of a two - stage training strategy, simplifying the training process and saving computational cost. Through these improvements, the experimental results of CLIP - FSAC++ on the VisA and MVTEC - AD datasets show that it outperforms existing few - shot anomaly detection methods in 1 - shot, 2 - shot, 4 - shot, and 8 - shot settings, and even exceeds some full - sample anomaly detection methods.

CLIP-FSAC++: Few-Shot Anomaly Classification with Anomaly Descriptor Based on CLIP

CLIP3D-AD: Extending CLIP for 3D Few-Shot Anomaly Detection with Multi-View Images Generation

FADE: Few-shot/zero-shot Anomaly Detection Engine using Large Vision-Language Model

WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation

Random Word Data Augmentation with CLIP for Zero-Shot Anomaly Detection

Few-Shot Classification of Screen Defects with Class-Agnostic Mask and Context-Based Classifier.

AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection

Ultrasound-guided stellate ganglion block successfully prevented esophageal puncture.

CLIP-AD: A Language-Guided Staged Dual-Path Model for Zero-shot Anomaly Detection

Registration Based Few-Shot Anomaly Detection

MediCLIP: Adapting CLIP for Few-shot Medical Image Anomaly Detection

COFT-AD: COntrastive Fine-Tuning for Few-Shot Anomaly Detection

Few-Shot Anomaly Detection via Category-Agnostic Registration Learning

Prioritized Local Matching Network for Cross-Category Few-Shot Anomaly Detection

Few-shot Online Anomaly Detection and Segmentation

Dual-path Frequency Discriminators for Few-shot Anomaly Detection

Dual-Image Enhanced CLIP for Zero-Shot Anomaly Detection

Delving into CLIP latent space for Video Anomaly Recognition

An Effective Industrial Defect Classification Method under the Few-Shot Setting Via Two-Stream Training

AnomalySD: Few-Shot Multi-Class Anomaly Detection with Stable Diffusion Model