A Unet-inspired spatial-attention transformer model for segmenting gear tooth surface defects

Xin Zhou,Yongchao Zhang,Zhaohui Ren,Tianchuan Mi,Zeyu Jiang,Tianzhuang Yu,Shihua Zhou
DOI: https://doi.org/10.1016/j.aei.2024.102933
IF: 8.8
2024-11-22
Advanced Engineering Informatics
Abstract:Automated vision defect detection is a crucial step in monitoring product quality in industrial production. Despite the widespread utilization of deep learning methods for surface defect identification, several challenges persist in the context of gear applications. Firstly, there is a lack of dedicated defect detection methods specifically tailored for gear tooth surfaces. As surface defects vary in size, the regular single-scale attention computation at each transformer layer tends to compromise spatial information. To address these challenges, we first propose a novel U-shaped spatial-attention transformer model for tooth surface detection. A shunted-window method is introduced to create a pyramid receptive field within a single self-attention layer. This method captures fine-grained features with a small window while preserving coarse-grained features with a larger window. Consequently, this technique enables effective multi-scale information fusion, accommodating objects of different sizes. We curate a dataset of defective samples collected under various working conditions using the CL-100 gear wear machine. Experimental results demonstrate that the proposed model outperforms the state-of-the-art (SOTA) U-shaped SwinUnet by +8.74% AP and +4.40% Sm, while surpassing the excellent defect detection method of ResT-UperNet by +0.63% AP and +4.69% Sm.
engineering, multidisciplinary,computer science, artificial intelligence
What problem does this paper attempt to address?