Abstract:Unsupervised anomalous sound detection, especially self-supervised methods, plays a crucial role in differentiating unknown abnormal sounds of machines from normal sounds. Self-supervised learning can be divided into two main categories: Generative and Contrastive methods. While Generative methods mainly focus on reconstructing data, Contrastive learning methods refine data representations by leveraging the contrast between each sample and its augmented version. However, existing Contrastive learning methods for anomalous sound detection often have two main problems. The first one is that they mostly rely on simple augmentation techniques, such as time or frequency masking, which may introduce biases due to the limited diversity of real-world sounds and noises encountered in practical scenarios (e.g., factory noises combined with machine sounds). The second issue is dimension collapsing, which leads to learning a feature space with limited representation. To address the first shortcoming, we suggest a diffusion-based data augmentation method that employs ChatGPT and AudioLDM. Also, to address the second concern, we put forward a two-stage self-supervised model. In the first stage, we introduce a novel approach that combines Contrastive learning and masked autoencoders to pre-train on the MIMII and ToyADMOS2 datasets. This combination allows our model to capture both global and local features, leading to a more comprehensive representation of the data. In the second stage, we refine the audio representations for each machine ID by employing supervised Contrastive learning to fine-tune the pre-trained model. This process enhances the relationship between audio features originating from the same machine ID. Experiments show that our method outperforms most of the state-of-the-art self-supervised learning methods. Our suggested model achieves an average AUC and pAUC of 94.39% and 87.93% on the DCASE 2020 Challenge Task2 dataset, respectively.

Exploring Large Scale Pre-Trained Models for Robust Machine Anomalous Sound Detection

Machine Anomalous Sound Detection Based on Self-Supervised Classification

Explore the Use of Self-supervised Pre-trained Acoustic Features on Disguised Speech Detection

Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Using Classification-Based Methods

Domain Shift-oriented Machine Anomalous Sound Detection Model Based on Self-Supervised Learning

Representation Learning Using Machine Attribute Information for Anomalous Sound Detection in Real Scenarios

Anomalous Sound Detection using Audio Representation with Machine ID based Contrastive Learning Pretraining

Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models

AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

Regularized Contrastive Masked Autoencoder Model for Machinery Anomaly Detection Using Diffusion-Based Data Augmentation

Outlier-aware Inlier Modeling and Multi-scale Scoring for Anomalous Sound Detection via Multitask Learning

Improving Unsupervised Anomalous Sound Detection Performance of Autoencoder and Its Variant with Pretrained Deep Belief Network

Anomalous Sound Detection Using Self-Attention-Based Frequency Pattern Analysis of Machine Sounds

A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection

Transformer-based Autoencoder with ID Constraint for Unsupervised Anomalous Sound Detection

Autoencoder with Group-based Decoder and Multi-task Optimization for Anomalous Sound Detection

Robust Audio Sensing with Multi-Sound Classification.

Description and Discussion on DCASE 2023 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation

Self-supervised Complex Network for Machine Sound Anomaly Detection

The Impact of Frequency Bands on Acoustic Anomaly Detection of Machines using Deep Learning Based Model