CINFormer: Transformer network with multi-stage CNN feature injection for surface defect segmentation

Xiaoheng Jiang,Kaiyi Guo,Yang Lu,Feng Yan,Hao Liu,Jiale Cao,Mingliang Xu,Dacheng Tao
2023-09-22
Abstract:Surface defect inspection is of great importance for industrial manufacture and production. Though defect inspection methods based on deep learning have made significant progress, there are still some challenges for these methods, such as indistinguishable weak defects and defect-like interference in the background. To address these issues, we propose a transformer network with multi-stage CNN (Convolutional Neural Network) feature injection for surface defect segmentation, which is a UNet-like structure named CINFormer. CINFormer presents a simple yet effective feature integration mechanism that injects the multi-level CNN features of the input image into different stages of the transformer network in the encoder. This can maintain the merit of CNN capturing detailed features and that of transformer depressing noises in the background, which facilitates accurate defect detection. In addition, CINFormer presents a Top-K self-attention module to focus on tokens with more important information about the defects, so as to further reduce the impact of the redundant background. Extensive experiments conducted on the surface defect datasets DAGM 2007, Magnetic tile, and NEU show that the proposed CINFormer achieves state-of-the-art performance in defect detection.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper is primarily dedicated to addressing the issue of surface defect detection in industrial manufacturing. Despite the significant progress made in defect detection using deep learning-based methods, there are still some challenges, such as the difficulty in distinguishing subtle defects and interference from the background. To tackle these problems, the authors propose a new method called CINFormer. CINFormer is a novel architecture that combines the characteristics of Convolutional Neural Networks (CNN) and Transformer networks, with a structure similar to U-Net. This method is implemented by injecting multi-level CNN features at different stages of the Transformer network. This strategy helps to maintain the advantage of CNNs in capturing detailed features and the benefit of Transformer networks in suppressing background noise. Additionally, CINFormer introduces a Top-K self-attention module to focus on tokens that contain more important information about defects, thereby further reducing the impact of redundant background information. This helps to accurately detect subtle or small defects in complex scenarios. Extensive experiments on three typical surface defect datasets (DAGM 2007, Magnetic tile, and NEU) show that CINFormer achieves state-of-the-art performance in defect detection tasks. These results validate the effectiveness and superiority of CINFormer.