ASD-Diffusion: Anomalous Sound Detection with Diffusion Models

Fengrun Zhang,Xiang Xie,Kai Guo
2024-09-24
Abstract:Unsupervised Anomalous Sound Detection (ASD) aims to design a generalizable method that can be used to detect anomalies when only normal sounds are given. In this paper, Anomalous Sound Detection based on Diffusion Models (ASD-Diffusion) is proposed for ASD in real-world factories. In our pipeline, the anomalies in acoustic features are reconstructed from their noisy corrupted features into their approximate normal pattern. Secondly, a post-processing anomalies filter algorithm is proposed to detect anomalies that exhibit significant deviation from the original input after reconstruction. Furthermore, denoising diffusion implicit model is introduced to accelerate the inference speed by a longer sampling interval of the denoising process. The proposed method is innovative in the application of diffusion models as a new scheme. Experimental results on the development set of DCASE 2023 challenge task 2 outperform the baseline by 7.75%, demonstrating the effectiveness of the proposed method.
Sound,Artificial Intelligence,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to detect anomalous sounds (Anomalous Sound Detection, ASD) in industrial scenarios when only normal sound data are available. Specifically, the research objective is to design a general method to detect anomalous sounds during machine operation without tuning the model's hyper - parameters. This is known as "first - shot" anomalous sound detection (first - shot ASD), that is, using only normal sound data for training and being able to detect unseen anomalous sound patterns during testing. ### Main Problems and Challenges 1. **Lack of Labeled Data**: In practical applications, due to the diversity of operating conditions and the atypical nature of abnormal situations, it is very difficult to collect sound data that fully covers abnormal patterns. 2. **First - shot Detection**: It is required to detect anomalous sounds when only normal sound data are available, without adjusting the model's hyper - parameters for each type of machine. 3. **High - Dimensional Time - Frequency Information Processing**: Audio signals contain complex high - dimensional time - frequency information, and how to effectively represent and process this information is a challenge. ### Solutions The paper proposes an anomalous sound detection method based on the diffusion model (ASD - Diffusion), and the main innovations include: 1. **Application of the Diffusion Model**: For the first time, the diffusion model is applied to the field of anomalous sound detection. The diffusion model learns the distribution of normal sounds by gradually adding noise and reconstructing clean sound features. 2. **Post - processing Anomaly Filtering Algorithm**: A post - processing anomaly filtering (AF) algorithm is proposed to detect significant deviations between reconstructed samples and original samples in order to locate abnormal regions. 3. **Accelerating Inference Speed**: The denoising diffusion implicit model (DDIM) is introduced to accelerate the inference process by increasing the sampling interval while maintaining good detection performance. ### Method Overview - **Forward Diffusion Process**: Gradually add noise to normal sound features to generate noisy features. - **Reverse Denoising Process**: Predict the noise in the noisy features by training a neural network and gradually remove the noise to reconstruct sound features close to normal. - **Anomaly Detection**: Detect anomalous sounds by comparing the differences (such as mean - square error or absolute error) between the original sample and the reconstructed sample. - **Post - processing**: Use the AF algorithm to further filter out abnormal regions and improve detection accuracy. ### Experimental Results The experimental results show that ASD - Diffusion performs excellently in Task 2 of the DCASE 2023 Challenge, with a 7.75% performance improvement compared to the baseline method, especially outstanding in the target domain (target domain), even when only a small amount of normal audio data in the target domain is provided. ### Conclusion The paper demonstrates the effectiveness and potential of the diffusion model in anomalous sound detection, especially in the first - shot scenario. Future work will further explore unsupervised methods and provide better anomaly - locating capabilities.