MSTAD: A masked subspace-like transformer for multi-class anomaly detection

Borui Kang,Yuzhong Zhong,Zhimin Sun,Lin Deng,Maoning Wang,Jianwei Zhang
DOI: https://doi.org/10.1016/j.knosys.2023.111186
IF: 8.139
2024-01-01
Knowledge-Based Systems
Abstract:Unsupervised anomaly detection techniques, which do not rely on prior knowledge of anomalies, have attracted considerable attention in the field of industrial surface inspection. However, existing approaches commonly employ separate models for each product class, resulting in substantial storage requirements and inefficiency during the training phase. Accordingly, we propose a masked subspace-like transformer for multi-class anomaly detection (MSTAD), which employs an encoder–decoder architecture to reconstruct the pre-trained image features by recognizing the greater resilience of high-level semantic features compared with low-level pixel features. To address the issue of identity mapping, which refers to the tendency of a model to overgeneralize when reconstructing abnormal samples, MSTAD integrates two essential components: the multi-layer subspace-like embedding (MLSE) module and random block mask (RBM) method. The MLSE module incorporates an attention mechanism to selectively emphasize the common embeddings associated with each class, thereby enhancing both the ability of the model to reconstruct anomalies as normal and its capacity for training. RBM applies a random mask block mechanism to the pre-trained feature map to enhance the comprehension ability of the model and improve the reconstruction of normal features. We conducted extensive experiments on the MVTec AD and BTAD datasets, and the results demonstrated that MSTAD outperformed previous state-of-the-art methods in terms of anomaly detection and localization performance for multi-class anomaly detection tasks.
computer science, artificial intelligence
What problem does this paper attempt to address?