Abstract:The identification of imperfections on steel surfaces is vital for ensuring the quality of industrial products. It requires the capability of real-time detection with high accuracy. This paper proposes the CRGF-YOLO (Contextual Reparameterized Generalized Feature) model based on YOLOv5. In the network, BottleneckCSP structures and depthwise separable convolutions utilizing the structural reparameterization are introduced to reduce the model size and improve performance. In addition, contextual transformer modules are employed as self-attention mechanisms to improve feature representations by capturing long-range dependencies, outperforming conventional convolutional networks. Furthermore, the simplified generalized feature pyramid network is embedded to aggregate multi-scale feature maps and enhance the network's robustness. Finally, four prediction heads with different sizes are employed to predict defects, which are supported by prior bounding boxes generated using k-means clustering algorithm. The Focal-EIOU (Exponential Intersection over Union) loss function is introduced to improve detection accuracy and expedite model convergence. The improved model achieves a mean average precision (mAP) of 82.2% on the NEU-DET dataset, outperforming the baseline YOLOv5s by 7.7% mAP while maintaining real-time speeds. Comparative evaluations demonstrate CRGF-YOLO's superior performance over previous state-of-the-art methods like Faster R-CNN (77.4% mAP), YOLOv3 (77.4% mAP), YOLOv7s (72.1% mAP), and YOLOv8s (78.7% mAP) for steel surface defect detection. Overall, this study provides valuable insights and practical guidance for the advancement of defect detection technology.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the detection accuracy and real - time performance in steel surface defect detection. Specifically, the paper proposes an optimized multi - scale feature fusion model based on YOLOv5 - CRGF - YOLO (Contextual Reparameterized Generalized Feature YOLO), aiming to enhance the model's feature extraction ability at different scales and improve the detection accuracy of steel surface defects by introducing techniques such as structural re - parameterization, context transformer modules, and simplified Generalized Feature Pyramid Network (GFPN), while maintaining the real - time detection speed.
### Background and Challenges of the Paper
1. **Metal products in the manufacturing process are vulnerable to defects**: Such as surface oxidation, cracks, dents, and scratches. These defects will affect the quality and durability of the products, resulting in potential economic losses.
2. **Limitations of traditional defect detection algorithms**: Traditional vision - based defect detection methods, such as HOG, LBP, Fourier transform, Gabor filter, SVM, etc., although they can manually extract features, it is difficult to fully express information in complex situations, and the amount of calculation is large, and it is difficult to solve the problems of defect location and size.
3. **Limitations of Convolutional Neural Networks (CNN)**: As the network depth increases, the performance of CNN in processing small targets decreases because a larger receptive field is likely to ignore or mis - represent the details of small targets.
4. **The rise of Transformer models**: Vision Transformer (ViT) and DETR and other models capture global context information through self - attention mechanisms, improving the feature representation ability, but the amount of calculation and memory usage is large.
### Main Contributions of CRGF - YOLO
1. **Structural re - parameterization**: Introduced the re - parameterized CSP (Cross Stage Partial) structure and depth - separable convolution (Rep - DSC) to reduce the complexity of the model and enhance the feature extraction ability.
2. **Context Transformer Module**: Combined CNN and Transformer, and used the context transformer module (CoT) to enhance the feature expression of different layers and transmit effective information to the neck part.
3. **Simplified Generalized Feature Pyramid Network (GFPN)**: Designed a simplified GFPN to fuse feature maps of different scales and enhance the generalization ability of the network.
4. **Multi - scale prediction head**: Set up four prediction heads of different sizes, used the k - means clustering algorithm to generate prior bounding boxes, and introduced the Focal - EIOU loss function to further improve the model convergence speed and detection accuracy.
### Experimental Results
- **Performance improvement**: CRGF - YOLO achieved an average precision (mAP) of 82.2% on the NEU - DET data set, which is 7.7% mAP higher than the baseline YOLOv5s, while maintaining the real - time detection speed.
- **Comparison with other methods**: CRGF - YOLO is superior to Faster R - CNN (77.4% mAP), YOLOv3 (77.4% mAP), YOLOv7s (72.1% mAP), and YOLOv8s (78.7% mAP) in the steel surface defect detection task.
### Conclusion
This research provides an effective solution. By optimizing the YOLOv5 model, high - precision and real - time steel surface defect detection is achieved, providing valuable insights and practical guidance for the development of defect detection technology.