Abstract:Anomaly detection has recently gained increasing attention in the field of computer vision, likely due to its broad set of applications ranging from product fault detection on industrial production lines and impending event detection in video surveillance to finding lesions in medical scans. Regardless of the domain, anomaly detection is typically framed as a one-class classification task, where the learning is conducted on normal examples only. An entire family of successful anomaly detection methods is based on learning to reconstruct masked normal inputs (e.g. patches, future frames, etc.) and exerting the magnitude of the reconstruction error as an indicator for the abnormality level. Unlike other reconstruction-based methods, we present a novel self-supervised masked convolutional transformer block (SSMCTB) that comprises the reconstruction-based functionality at a core architectural level. The proposed self-supervised block is extremely flexible, enabling information masking at any layer of a neural network and being compatible with a wide range of neural architectures. In this work, we extend our previous self-supervised predictive convolutional attentive block (SSPCAB) with a 3D masked convolutional layer, a transformer for channel-wise attention, as well as a novel self-supervised objective based on Huber loss. Furthermore, we show that our block is applicable to a wider variety of tasks, adding anomaly detection in medical images and thermal videos to the previously considered tasks based on RGB images and surveillance videos. We exhibit the generality and flexibility of SSMCTB by integrating it into multiple state-of-the-art neural models for anomaly detection, bringing forth empirical results that confirm considerable performance improvements on five benchmarks. We release our code and data as open source at: <a class="link-external link-https" href="https://github.com/ristea/ssmctb" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper is primarily dedicated to addressing the problem of anomaly detection in the field of computer vision. Specifically, the research team proposes a method called Self-Supervised Masked Convolutional Transformer Block (SSMCTB), which aims to learn how to reconstruct masked information through self-supervision, thereby serving as a means for anomaly detection. The main contributions of SSMCTB include: 1. **Proposing Masked Convolution Operation**: This method applies a mask to the central region of the convolutional kernel, requiring the network to rely on the surrounding visible information to reconstruct the masked part. This helps the network learn how to recover missing or occluded content based on contextual information. 2. **Integration into Neural Networks**: SSMCTB can be embedded as an independent module into various existing neural network architectures, including convolution-based and transformer-based architectures, to enhance their performance in anomaly detection tasks. 3. **Extension to 3D Convolution**: In addition to standard 2D masked convolution, the research also extends to 3D masked convolution, enabling SSMCTB to be applied to 3D inputs such as video data. 4. **Adoption of Multi-Head Self-Attention Mechanism**: Compared to previous work, SSMCTB uses a more powerful multi-head self-attention module instead of a simple channel attention module, thereby enhancing the model's learning capability. 5. **Improved Loss Function**: The use of Huber loss instead of Mean Squared Error (MSE) loss improves robustness to outliers. The experimental section demonstrates the effectiveness of SSMCTB on multiple benchmark datasets, including image and video data, covering various fields such as industry and medicine. The results show that integrating SSMCTB into existing state-of-the-art models can significantly improve anomaly detection performance. In summary, this paper effectively addresses key challenges in the field of anomaly detection by proposing a novel and flexible self-supervised learning method—SSMCTB, and demonstrates its broad application potential.

Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection

Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection

Self-Supervised Anomaly Detection from Anomalous Training Data via Iterative Latent Token Masking

Masked Transformer for image Anomaly Localization

MSTAD: A masked subspace-like transformer for multi-class anomaly detection

Anomaly Detection Based on a 3D Convolutional Neural Network Combining Convolutional Block Attention Module Using Merged Frames

Enhancing Video Anomaly Detection Using a Transformer Spatiotemporal Attention Unsupervised Framework for Large Datasets

Mask2Anomaly: Mask Transformer for Universal Open-set Segmentation

SMCNet: Sparse-Inspired Masked Convolutional Network for Hyperspectral Anomaly Detection

Anomaly detection in surveillance videos using Transformer with margin learning

Masked Swin Transformer Unet for Industrial Anomaly Detection

A Novel MAE-Based Self-Supervised Anomaly Detection and Localization Method

3D Masked Autoencoders with Application to Anomaly Detection in Non-Contrast Enhanced Breast MRI

Unsupervised Anomaly Detection in Medical Images with a Memory-augmented Multi-level Cross-attentional Masked Autoencoder

A Transformer Architecture based mutual attention for Image Anomaly Detection

UTRAD: Anomaly detection and localization with U-Transformer

DMU-TransNet: Dense multi-scale U-shape transformer network for anomaly detection

Anomaly Detection for Medical Images using Heterogeneous Auto-Encoder

Self-Supervised Masking for Unsupervised Anomaly Detection and Localization

ADTR: Anomaly Detection Transformer with Feature Reconstruction