Multimodal Industrial Anomaly Detection via Hybrid Fusion

Yue Wang,Jinlong Peng,Jiangning Zhang,Ran Yi,Yabiao Wang,Chengjie Wang
2023-09-07
Abstract:2D-based Industrial Anomaly Detection has been widely discussed, however, multimodal industrial anomaly detection based on 3D point clouds and RGB images still has many untouched fields. Existing multimodal industrial anomaly detection methods directly concatenate the multimodal features, which leads to a strong disturbance between features and harms the detection performance. In this paper, we propose Multi-3D-Memory (M3DM), a novel multimodal anomaly detection method with hybrid fusion scheme: firstly, we design an unsupervised feature fusion with patch-wise contrastive learning to encourage the interaction of different modal features; secondly, we use a decision layer fusion with multiple memory banks to avoid loss of information and additional novelty classifiers to make the final decision. We further propose a point feature alignment operation to better align the point cloud and RGB features. Extensive experiments show that our multimodal industrial anomaly detection model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTec-3D AD dataset. Code is available at <a class="link-external link-https" href="https://github.com/nomewang/M3DM" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the issue of multimodal fusion in industrial anomaly detection, specifically how to effectively combine 3D point cloud data and RGB images for more accurate industrial product defect detection. Specifically, existing multimodal industrial anomaly detection methods directly concatenate multimodal features, which leads to strong interference between features, thereby impairing detection performance. To solve this problem, the paper proposes the Multi-3D-Memory (M3DM) method, a novel multimodal anomaly detection method based on a hybrid fusion scheme. The main contributions of M3DM include: 1. **Unsupervised Feature Fusion (UFF)**: Encourages interaction between different modal features through patch-level contrastive loss, thereby learning the maximum mutual information of different modal features at the same location. 2. **Decision-Level Fusion (DLF)**: Utilizes multiple memory banks for final decision-making, avoiding information loss and improving detection accuracy. 3. **Point Feature Alignment (PFA)**: Projects 3D features onto a 2D plane to better align point cloud and RGB features, simplifying multimodal interaction and enhancing detection performance. Experimental results show that M3DM outperforms existing state-of-the-art methods in both detection and segmentation accuracy on the MVTec-3D AD dataset.