Abstract:With the wide application of knowledge distillation between an ImageNet pre-trained teacher model and a learnable student model, industrial anomaly detection has witnessed a significant achievement in the past few years. The success of knowledge distillation mainly relies on how to keep the feature discrepancy between the teacher and student model, in which it assumes that: (1) the teacher model can jointly represent two different distributions for the normal and abnormal patterns, while (2) the student model can only reconstruct the normal distribution. However, it still remains a challenging issue to maintain these ideal assumptions in practice. In this paper, we propose a simple yet effective two-stage industrial anomaly detection framework, termed as AAND, which sequentially performs Anomaly Amplification and Normality Distillation to obtain robust feature discrepancy. In the first anomaly amplification stage, we propose a novel Residual Anomaly Amplification (RAA) module to advance the pre-trained teacher encoder. With the exposure of synthetic anomalies, it amplifies anomalies via residual generation while maintaining the integrity of pre-trained model. It mainly comprises a Matching-guided Residual Gate and an Attribute-scaling Residual Generator, which can determine the residuals' proportion and characteristic, respectively. In the second normality distillation stage, we further employ a reverse distillation paradigm to train a student decoder, in which a novel Hard Knowledge Distillation (HKD) loss is built to better facilitate the reconstruction of normal patterns. Comprehensive experiments on the MvTecAD, VisA, and MvTec3D-RGB datasets show that our method achieves state-of-the-art performance.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to increase the feature differences between the pre - trained teacher model and the student model in industrial anomaly detection (IAD) to achieve more robust anomaly detection. Specifically, the paper proposes improvement methods for two main assumptions: 1. **The teacher encoder can represent different distributions of normal and abnormal patterns simultaneously**: - The paper points out that the performance of existing methods will significantly decline when using randomly initialized teacher models. Even when using pre - trained models, it is difficult to generate highly discriminative features to solve the IAD problem. This is mainly because of different task objectives (pre - trained models aim to distinguish different object categories, while the IAD task needs to distinguish normal and abnormal instances within the same object category), and the scarcity of abnormal data may cause pre - trained models to learn some semantics irrelevant to the IAD task. 2. **The student decoder can only reconstruct the distribution of normal patterns**: - Existing methods usually use natural constraints to ensure that the student model does not reconstruct abnormal patterns, but these methods perform poorly when dealing with complex normal patterns (such as fine - grained normal textures and rare normal patterns). To address these problems, the paper proposes a two - stage industrial anomaly detection framework (AAND), specifically including: 1. **Anomaly amplification stage**: - A new residual - based anomaly amplification (RAA) module is introduced to enhance the anomaly detection ability of the pre - trained teacher model by synthesizing anomalies. The RAA module contains a matching - guided residual gate and an attribute - scaled residual generator, which are respectively used to adjust the proportion and characteristics of the residual. The goal of this stage is to amplify abnormal features while maintaining the integrity of the pre - trained model. 2. **Normality distillation stage**: - The student decoder is trained using the reverse distillation paradigm so that it can only reconstruct the distribution of normal patterns. A new hard - knowledge distillation (HKD) loss is introduced, which is specifically used to improve the reconstruction of complex normal patterns. The goal of this stage is to ensure that the student model can perform more accurate reconstruction when dealing with challenging normal patterns. Through the optimization of these two stages, the paper aims to achieve more robust feature differences between the teacher model and the student model, thereby improving the performance of industrial anomaly detection. Experimental results show that this method has achieved state - of - the - art performance on multiple datasets.

Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection

Pull & Push: Leveraging Differential Knowledge Distillation for Efficient Unsupervised Anomaly Detection and Localization

VDKD: A ViT-Based Student-Teacher Knowledge Distillation for Multi-Texture Class Anomaly Detection

Unsupervised Anomaly Detection via Normal Feature-Enhanced Reverse Teacher–Student Distillation

Anomaly detection based on multi-teacher knowledge distillation

Dual-student knowledge distillation for visual anomaly detection

Unsupervised anomaly detection and localization via bidirectional knowledge distillation

RDMS: Reverse distillation with multiple students of different scales for anomaly detection

Dual-Modeling Decouple Distillation for Unsupervised Anomaly Detection

Remembering Normality: Memory-guided Knowledge Distillation for Unsupervised Anomaly Detection

Reverse Distillation for Continuous Anomaly Detection

Autoencoder-Like Knowledge Distillation Network for Anomaly Detection

Structural Teacher-Student Normality Learning for Multi-Class Anomaly Detection and Localization

Anomaly Detection via Reverse Distillation from One-Class Embedding

A Diffusion-Based Framework for Multi-Class Anomaly Detection

Enhanced multi-scale features mutual mapping fusion based on reverse knowledge distillation for industrial anomaly detection and localization

AEKD: Unsupervised auto-encoder knowledge distillation for industrial anomaly detection

Dual-Student Knowledge Distillation Networks for Unsupervised Anomaly Detection

Context-aware Feature Reconstruction for Class-Incremental Anomaly Detection and Localization

Cosine similarity knowledge distillation for surface anomaly detection

Unlocking the Potential of Reverse Distillation for Anomaly Detection