Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection

Canhui Tang,Sanping Zhou,Yizhe Li,Yonghao Dong,Le Wang
2024-05-03
Abstract:With the wide application of knowledge distillation between an ImageNet pre-trained teacher model and a learnable student model, industrial anomaly detection has witnessed a significant achievement in the past few years. The success of knowledge distillation mainly relies on how to keep the feature discrepancy between the teacher and student model, in which it assumes that: (1) the teacher model can jointly represent two different distributions for the normal and abnormal patterns, while (2) the student model can only reconstruct the normal distribution. However, it still remains a challenging issue to maintain these ideal assumptions in practice. In this paper, we propose a simple yet effective two-stage industrial anomaly detection framework, termed as AAND, which sequentially performs Anomaly Amplification and Normality Distillation to obtain robust feature discrepancy. In the first anomaly amplification stage, we propose a novel Residual Anomaly Amplification (RAA) module to advance the pre-trained teacher encoder. With the exposure of synthetic anomalies, it amplifies anomalies via residual generation while maintaining the integrity of pre-trained model. It mainly comprises a Matching-guided Residual Gate and an Attribute-scaling Residual Generator, which can determine the residuals' proportion and characteristic, respectively. In the second normality distillation stage, we further employ a reverse distillation paradigm to train a student decoder, in which a novel Hard Knowledge Distillation (HKD) loss is built to better facilitate the reconstruction of normal patterns. Comprehensive experiments on the MvTecAD, VisA, and MvTec3D-RGB datasets show that our method achieves state-of-the-art performance.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to increase the feature differences between the pre - trained teacher model and the student model in industrial anomaly detection (IAD) to achieve more robust anomaly detection. Specifically, the paper proposes improvement methods for two main assumptions: 1. **The teacher encoder can represent different distributions of normal and abnormal patterns simultaneously**: - The paper points out that the performance of existing methods will significantly decline when using randomly initialized teacher models. Even when using pre - trained models, it is difficult to generate highly discriminative features to solve the IAD problem. This is mainly because of different task objectives (pre - trained models aim to distinguish different object categories, while the IAD task needs to distinguish normal and abnormal instances within the same object category), and the scarcity of abnormal data may cause pre - trained models to learn some semantics irrelevant to the IAD task. 2. **The student decoder can only reconstruct the distribution of normal patterns**: - Existing methods usually use natural constraints to ensure that the student model does not reconstruct abnormal patterns, but these methods perform poorly when dealing with complex normal patterns (such as fine - grained normal textures and rare normal patterns). To address these problems, the paper proposes a two - stage industrial anomaly detection framework (AAND), specifically including: 1. **Anomaly amplification stage**: - A new residual - based anomaly amplification (RAA) module is introduced to enhance the anomaly detection ability of the pre - trained teacher model by synthesizing anomalies. The RAA module contains a matching - guided residual gate and an attribute - scaled residual generator, which are respectively used to adjust the proportion and characteristics of the residual. The goal of this stage is to amplify abnormal features while maintaining the integrity of the pre - trained model. 2. **Normality distillation stage**: - The student decoder is trained using the reverse distillation paradigm so that it can only reconstruct the distribution of normal patterns. A new hard - knowledge distillation (HKD) loss is introduced, which is specifically used to improve the reconstruction of complex normal patterns. The goal of this stage is to ensure that the student model can perform more accurate reconstruction when dealing with challenging normal patterns. Through the optimization of these two stages, the paper aims to achieve more robust feature differences between the teacher model and the student model, thereby improving the performance of industrial anomaly detection. Experimental results show that this method has achieved state - of - the - art performance on multiple datasets.