Exploring Large Scale Pre-Trained Models for Robust Machine Anomalous Sound Detection

Bing Han,Zhiqiang Lv,Anbai Jiang,Wen Huang,Zhengyang Chen,Yufeng Deng,Jiawei Ding,Cheng Lu,Wei-Qiang Zhang,Pingyi Fan,Jia Liu,Yanmin Qian
DOI: https://doi.org/10.1109/icassp48485.2024.10447183
2024-01-01
Abstract:Machine anomalous sound detection is a useful technique for various applications, but it often suffers from poor generalization due to the challenges of data collection and complex acoustic environment. To address this issue, we propose a robust machine anomalous sound detection model that leverages self-supervised pre-trained models on large-scale speech data. Specifically, we assign different weights to the features from different layers of the pre-trained model and then use the working condition as the label for self-supervised classification fine-tuning. Moreover, we introduce a data augmentation method that simulates different operating states of the machine to enrich the dataset. Furthermore, we devise a transformer pooling method that fuses the features of different segments. Experiments on the DCASE2023 dataset show that our proposed method outperforms the commonly used reconstruction-based autoencoder and classification-based convolutional network by a large margin, demonstrating the effectiveness of large-scale pre-training for enhancing the generalization and robustness of machine anomalous sound detection. In Task2 of DCASE2023, we achieve 2nd place with these methods.
What problem does this paper attempt to address?