Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection

Chengjie Wang,Wenbing Zhu,Bin-Bin Gao,Zhenye Gan,Jianning Zhang,Zhihao Gu,Shuguang Qian,Mingang Chen,Lizhuang Ma
2024-03-19
Abstract:Industrial anomaly detection (IAD) has garnered significant attention and experienced rapid development. However, the recent development of IAD approach has encountered certain difficulties due to dataset limitations. On the one hand, most of the state-of-the-art methods have achieved saturation (over 99% in AUROC) on mainstream datasets such as MVTec, and the differences of methods cannot be well distinguished, leading to a significant gap between public datasets and actual application scenarios. On the other hand, the research on various new practical anomaly detection settings is limited by the scale of the dataset, posing a risk of overfitting in evaluation results. Therefore, we propose a large-scale, Real-world, and multi-view Industrial Anomaly Detection dataset, named Real-IAD, which contains 150K high-resolution images of 30 different objects, an order of magnitude larger than existing datasets. It has a larger range of defect area and ratio proportions, making it more challenging than previous datasets. To make the dataset closer to real application scenarios, we adopted a multi-view shooting method and proposed sample-level evaluation metrics. In addition, beyond the general unsupervised anomaly detection setting, we propose a new setting for Fully Unsupervised Industrial Anomaly Detection (FUIAD) based on the observation that the yield rate in industrial production is usually greater than 60%, which has more practical application value. Finally, we report the results of popular IAD methods on the Real-IAD dataset, providing a highly challenging benchmark to promote the development of the IAD field.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of the limitations of existing data sets in the field of Industrial Anomaly Detection (IAD). Specifically, although current mainstream data sets such as MVTec AD and VisA have promoted the development of IAD technology, they have the following deficiencies: 1. **Limited data scale**: The number of samples in existing data sets is relatively small. For example, MVTec AD contains only about 5,354 images, and VisA contains about 10,821 images. This may lead to over - fitting during model evaluation and it is difficult to distinguish the performance differences between different methods. 2. **Small defect range and proportion**: The proportion and range of defective areas in existing data sets are small, so these data sets cannot fully reflect the complexity and challenges in practical application scenarios. 3. **Lack of multi - view images**: Most of the existing 2D IAD data sets only contain single - view images, while in practical applications, the object structures are complex and single - view images cannot cover all defects. 4. **Dependence on supervised learning**: Although IAD is usually regarded as an unsupervised learning task, normal samples still need to be manually labeled during the training process, which introduces noise samples and increases labor costs. To solve the above problems, the author proposes a new large - scale, multi - view industrial anomaly detection data set - Real - IAD. This data set has the following characteristics: - **Large - scale**: It contains 30 different categories of objects, with 5 shooting angles for each category, totaling 150,000 high - resolution images. - **Multi - view**: Each sample consists of images from multiple different perspectives, solving the problem that single - view images cannot comprehensively capture defects. - **More challenging**: The proportion and range of defective areas are larger, which can better evaluate the performance differences of different methods. - **Fully unsupervised setting**: A brand - new Fully Unsupervised Industrial Anomaly Detection (FUIAD) setting is proposed. Based on the characteristic that the good product rate of the production line is greater than 60%, a certain proportion of abnormal samples are added to the training set, which is closer to the practical application scenario. By constructing the Real - IAD data set, the author hopes to promote the technological development in the IAD field, encourage the emergence of more efficient and practical detection methods, and provide stronger technical support for industrial production. ### Summary The main contributions of this paper include: 1. Proposing a large - scale, multi - view industrial anomaly detection data set Real - IAD, which contains 150K high - resolution images and covers 30 categories. 2. Constructing a fully unsupervised IAD setting (FUIAD) that is closer to the practical application scenario, using naturally existing constraints (that is, the good product rate of the production line is greater than 60%) without additional manual labeling. 3. Evaluating a variety of popular IAD methods on the new data set, providing a highly challenging benchmark test to promote the improvement and innovation of algorithms.