Revisiting Deep Feature Reconstruction for Logical and Structural Industrial Anomaly Detection

Sukanya Patra,Souhaib Ben Taieb
2024-10-22
Abstract:Industrial anomaly detection is crucial for quality control and predictive maintenance, but it presents challenges due to limited training data, diverse anomaly types, and external factors that alter object appearances. Existing methods commonly detect structural anomalies, such as dents and scratches, by leveraging multi-scale features from image patches extracted through deep pre-trained networks. However, significant memory and computational demands often limit their practical application. Additionally, detecting logical anomalies-such as images with missing or excess elements-requires an understanding of spatial relationships that traditional patch-based methods fail to capture. In this work, we address these limitations by focusing on Deep Feature Reconstruction (DFR), a memory- and compute-efficient approach for detecting structural anomalies. We further enhance DFR into a unified framework, called ULSAD, which is capable of detecting both structural and logical anomalies. Specifically, we refine the DFR training objective to improve performance in structural anomaly detection, while introducing an attention-based loss mechanism using a global autoencoder-like network to handle logical anomaly detection. Our empirical evaluation across five benchmark datasets demonstrates the performance of ULSAD in detecting and localizing both structural and logical anomalies, outperforming eight state-of-the-art methods. An extensive ablation study further highlights the contribution of each component to the overall performance improvement. Our code is available at <a class="link-external link-https" href="https://github.com/sukanyapatra1997/ULSAD-2024.git" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve two main problems in Industrial Anomaly Detection (IAD): the detection of structural anomalies and logical anomalies. Specifically: 1. **Structural Anomaly Detection**: - **Problem Description**: Structural anomalies refer to local structural defects in an image, such as dents, scratches, etc. Existing methods usually extract multi - scale features of image patches through deep pre - trained networks to detect these anomalies. However, this method has significant memory and computational requirements, which limit its practical applications. - **Solution**: The author re - examines Deep Feature Reconstruction (DFR) and proposes a more efficient method to detect structural anomalies. By modifying the training objective of DFR and combining ℓ2 distance and cosine distance, the performance of structural anomaly detection is improved. 2. **Logical Anomaly Detection**: - **Problem Description**: Logical anomalies refer to cases where elements in an image are missing, redundant, or violate geometric constraints. Traditional image - patch - based methods cannot capture such global spatial relationships, so it is difficult to detect logical anomalies. - **Solution**: To deal with logical anomalies, the author introduces a loss function based on the attention mechanism and uses a global auto - encoder network to learn the spatial relationships in normal images. This enables the model to understand the relative positions between various objects in the image, thereby effectively detecting logical anomalies. ### Unified Framework ULSAD The author proposes a unified framework - ULSAD (Unified Logical and Structural Anomaly Detection), which can simultaneously detect and locate structural and logical anomalies. Specific improvements include: - **Structural Anomaly Detection**: By combining ℓ2 distance and cosine distance to measure the differences between feature vectors, the detection accuracy is improved. \[ L_{pl}(\tilde{Z}', Z)=\frac{1}{k^*}\sum_{k = 1}^{k^*}l_v(\tilde{z}_k', z_k)+\lambda_l l_d(\tilde{z}_k', z_k) \] where: \[ l_v(\tilde{z}_k', z_k)=\|\tilde{z}_k' - z_k\|_2^2 \] \[ l_d(\tilde{z}_k', z_k)=1-\frac{(\tilde{z}_k')^T z_k}{\|\tilde{z}_k'\|_2\|z_k\|_2} \] - **Logical Anomaly Detection**: A loss function based on the attention mechanism is introduced, and a global auto - encoder network is used to learn the spatial relationships in normal images. \[ L_{pg}(\hat{A}, A)=\frac{1}{k^*}\sum_{k = 1}^{k^*}l_v(\hat{a}_k, a_k)+\lambda_g l_d(\hat{a}_k, a_k) \] In addition, the author also verifies the performance of ULSAD on multiple benchmark datasets through experiments and shows the contribution of each component to the overall performance through ablation studies. In conclusion, this paper aims to develop a unified framework that can efficiently detect and locate structural and logical anomalies in industrial images by improving the DFR method and introducing a new loss mechanism.