A PRISMA Driven Systematic Review of Publicly Available Datasets for Benchmark and Model Developments for Industrial Defect Detection

Can Akbas,Irem Su Arin,Sinan Onal
2024-06-12
Abstract:Recent advancements in quality control across various industries have increasingly utilized the integration of video cameras and image processing for effective defect detection. A critical barrier to progress is the scarcity of comprehensive datasets featuring annotated defects, which are essential for developing and refining automated defect detection models. This systematic review, spanning from 2015 to 2023, identifies 15 publicly available datasets and critically examines them to assess their effectiveness and applicability for benchmarking and model development. Our findings reveal a diverse landscape of datasets, such as NEU-CLS, NEU-DET, DAGM, KolektorSDD, PCB Defect Dataset, and the Hollow Cylindrical Defect Detection Dataset, each with unique strengths and limitations in terms of image quality, defect type representation, and real-world applicability. The goal of this systematic review is to consolidate these datasets in a single location, providing researchers who seek such publicly available resources with a comprehensive reference.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the scarcity of high - quality labeled datasets in industrial defect detection. Specifically, by systematically reviewing publicly available datasets from 2015 to 2023, the author aims to evaluate the effectiveness and applicability of these datasets in benchmark testing and model development. The main objective of the paper is to integrate these datasets and provide researchers with a comprehensive reference resource, enabling them to more effectively select and utilize datasets suitable for specific defect detection requirements, thereby improving the reliability and efficiency of the quality control process. ### Main Problems 1. **Dataset Scarcity**: High - quality, detailed - labeled defect - detection datasets are very scarce, which limits the development and optimization of automatic defect - detection models. 2. **Lack of Dataset Diversity**: Existing datasets vary in image quality, defect - type representation, and practical applications, and lack unified standards. 3. **Insufficient Benchmark Testing**: There is a lack of systematic and up - to - date dataset reviews, making it difficult to fairly and effectively evaluate the performance of different models. ### Solutions - **Systematic Review**: Adopt the PRISMA 2020 guidelines to systematically identify and evaluate 15 publicly available datasets. - **Comprehensive Evaluation**: Analyze the characteristics, advantages, and limitations of each dataset to provide a comprehensive reference. - **Promote Research**: Provide an integrated resource library for researchers and industry practitioners to support their needs in experiments and operations. ### Specific Datasets - **NEU - CLS** and **NEU - DET**: From Northeastern University, containing 1,800 grayscale images, covering six common surface defects. - **DAGM**: Provided by the German Pattern Recognition Association, containing 16,100 images, artificially generated to simulate real - world problems. - **KolektorSDD**: Provided by the Kolektor Group, containing 399 high - resolution images, focusing on texture defect detection. - **PCB Defect Dataset**: From Peking University, containing 1,386 images, specifically for printed circuit board defect detection. - **Hollow Cylindrical Defect Detection Dataset**: Containing 2,142 images, for cylindrical surface defect detection. Through the systematic review of these datasets, the paper provides valuable resources for researchers in the field of industrial defect detection, which helps to promote the further development of this field.