Abstract:Identifying defects and anomalies in industrial products is a critical quality control task. Traditional manual inspection methods are slow, subjective, and error-prone. In this work, we propose a novel zero-shot training-free approach for automated industrial image anomaly detection using a multimodal machine learning pipeline, consisting of three foundation models. Our method first uses a large language model, i.e., GPT-3. generate text prompts describing the expected appearances of normal and abnormal products. We then use a grounding object detection model, called Grounding DINO, to locate the product in the image. Finally, we compare the cropped product image patches to the generated prompts using a zero-shot image-text matching model, called CLIP, to identify any anomalies. Our experiments on two datasets of industrial product images, namely MVTec-AD and VisA, demonstrate the effectiveness of this method, achieving high accuracy in detecting various types of defects and anomalies without the need for model training. Our proposed model enables efficient, scalable, and objective quality control in industrial manufacturing settings.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is defect and anomaly detection in industrial product images. Traditional manual inspection methods have the disadvantages of being slow, highly subjective and error - prone. To address these issues, the author proposes a novel zero - shot, no - training method for automated industrial image anomaly detection. Specifically, this method is implemented through a multimodal machine - learning pipeline, including three base models: 1. **Text Prompt Generation**: Use a large - language model (such as GPT - 3) to generate text prompts describing normal and abnormal products. 2. **Object Localization**: Use the Grounding DINO model to locate products in the image, in order to reduce the influence of background noise and handle multi - resolution challenges. 3. **Zero - shot Image - Text Matching**: Use the pre - trained CLIP model to compare the cropped product image with the generated text prompts to identify any anomalies. ### Formula Representation In terms of formulas, the following are the mathematical representations of the key steps: 1. **Text Prompt Generation**: \[ P_{\text{normal}}=\text{GPT}-3(x_{\text{normal}}) \] \[ P_{\text{anomaly}}=\text{GPT}-3(x_{\text{anomaly}}) \] where \( P_{\text{normal}} \) and \( P_{\text{anomaly}} \) are two sets of text prompts describing normal and abnormal products respectively. 2. **Object Localization**: \[ I_{\text{object}} = I[y:y + h,x:x + w] \] where \( I \) is the input image, \( b=[x,y,w,h] \) are the bounding box coordinates output by Grounding DINO, and \( I_{\text{object}} \) is the cropped image area containing the product. 3. **Zero - shot Anomaly Detection**: \[ s=\frac{e_{\text{fused}}\cdot t_{\text{anomaly}}}{e_{\text{fused}}\cdot t_{\text{anomaly}}+e_{\text{fused}}\cdot t_{\text{normal}}} \] where \( e_{\text{fused}} \) is the fused feature vector, \( t_{\text{normal}} \) and \( t_{\text{anomaly}} \) are the text embedding vectors of the "normal" and "anomaly" prompts respectively, and \( s \) is the anomaly score. ### Summary Through this method, the author aims to improve the efficiency, scalability and objectivity of quality control in the industrial manufacturing environment without a large amount of labeled training data. Experimental results show that this method performs well on two industrial product image datasets (MVTec - AD and VisA) and can accurately detect various types of defects and anomalies.

Automatic Prompt Generation and Grounding Object Detection for Zero-Shot Image Anomaly Detection

Towards Zero-shot Point Cloud Anomaly Detection: A Multi-View Projection Framework

Uni-3DAD: GAN-Inversion Aided Universal 3D Anomaly Detection on Model-free Products

AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection

PTMNet: Pixel-Text Matching Network for Zero-Shot Anomaly Detection

TF2: Few-shot Text-Free Training-Free Defect Image Generation for Industrial Anomaly Inspection

A Machine Vision-based Realtime Anomaly Detection Method for Industrial Products Using Deep Learning

CLIP-AD: A Language-Guided Staged Dual-Path Model for Zero-shot Anomaly Detection

Industrial Product Surface Anomaly Detection with Realistic Synthetic Anomalies Based on Defect Map Prediction

MuSc: Zero-Shot Industrial Anomaly Classification and Segmentation with Mutual Scoring of the Unlabeled Images

FADE: Few-shot/zero-shot Anomaly Detection Engine using Large Vision-Language Model

VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation

AnomalySD: Few-Shot Multi-Class Anomaly Detection with Stable Diffusion Model

VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection

Vision-Language Interaction via Contrastive Learning for Surface Anomaly Detection in Consumer Electronics Manufacturing

Unsupervised Automatic Defect Inspection based on Image Matching and Local One-class Classification

AnomalySeg: Deep Learning-Based Fast Anomaly Segmentation Approach for Surface Defect Detection

FD-UAD: Unsupervised Anomaly Detection Platform Based on Defect Autonomous Imaging and Enhancement

AnomalyNCD: Towards Novel Anomaly Class Discovery in Industrial Scenarios

Exploring Deep Learning-based Unsupervised Image Anomaly Detection and Localization Methods for Industrial Quality Assurance

Anomaly detection for industrial quality assurance: A comparative evaluation of unsupervised deep learning models