Abstract:Recent studies of multimodal industrial anomaly detection (IAD) based on 3D point clouds and RGB images have highlighted the importance of exploiting the redundancy and complementarity among modalities for accurate classification and segmentation. However, achieving multimodal IAD in practical production lines remains a work in progress. It is essential to consider the trade-offs between the costs and benefits associated with the introduction of new modalities while ensuring compatibility with current processes. Existing quality control processes combine rapid in-line inspections, such as optical and infrared imaging with high-resolution but time-consuming near-line characterization techniques, including industrial CT and electron microscopy to manually or semi-automatically locate and analyze defects in the production of Li-ion batteries and composite materials. Given the cost and time limitations, only a subset of the samples can be inspected by all in-line and near-line methods, and the remaining samples are only evaluated through one or two forms of in-line inspection. To fully exploit data for deep learning-driven automatic defect detection, the models must have the ability to leverage multimodal training and handle incomplete modalities during inference. In this paper, we propose CMDIAD, a Cross-Modal Distillation framework for IAD to demonstrate the feasibility of a Multi-modal Training, Few-modal Inference (MTFI) pipeline. Our findings show that the MTFI pipeline can more effectively utilize incomplete multimodal information compared to applying only a single modality for training and inference. Moreover, we investigate the reasons behind the asymmetric performance improvement using point clouds or RGB images as the main modality of inference. This provides a foundation for our future multimodal dataset construction with additional modalities from manufacturing scenarios.

ATOM: Automated Black-Box Testing of Multi-Label Image Classification Systems.

Automated Testing of Image Captioning Systems

A Combinatorial Interaction Testing Method for Multi-Label Image Classifier

Learning Image Labels On-the-fly for Training Robust Classification Models

A multi-label image classification method combining multi-stage image semantic information and label relevance

Spatial Context-Aware Object-Attentional Network for Multi-Label Image Classification

MCDNet: An Infrared Small Target Detection Network Using Multi-Criteria Decision and Adaptive Labeling Strategy

Multi-label Image Annotation Based on Convolutional Neural Network

S-MAT: Semantic-Driven Masked Attention Transformer for Multi-Label Aerial Image Classification.

Metamorphic Testing for Object Detection Systems

Midcn: A Multiple Instance Deep Convolutional Network For Image Classification

When the Small-Loss Trick is Not Enough: Multi-Label Image Classification with Noisy Labels Applied to CCTV Sewer Inspections

Rethinking Crowdsourcing Annotation: Partial Annotation with Salient Labels for Multi-Label Image Classification

OCR is All you need: Importing Multi-Modality into Image-based Defect Detection System

Using Metamorphic Relations to Verify and Enhance Artcode Classification

HyperMLL: Toward Robust Hyperspectral Image Classification With Multisource Label Learning

Complex Object Classification

Incomplete Multimodal Industrial Anomaly Detection via Cross-Modal Distillation

Double Attention Based on Graph Attention Network for Image Multi-Label Classification

MuMIC -- Multimodal Embedding for Multi-label Image Classification with Tempered Sigmoid

Multi-Modal Multi-Label Semantic Indexing of Images Using Unlabeled Data