DP-MAE: A Dual-Path Masked Autoencoder Based Self-Supervised Learning Method for Anomalous Sound Detection

Zhuo-Li Liu,Yan Song,Xiao-Min Zeng,Li-Rong Dai,Ian McLoughlin
DOI: https://doi.org/10.1109/icassp48485.2024.10447859
2024-01-01
Abstract:In this paper, we present a novel general-purpose audio representation learning method named Dual-Path Masked AutoEncoder (DPMAE) for anomalous sound detection (ASD) task. Existing methods mainly focus on frame-level generative methods or clip-level discriminative methods, which generally ignore the local information where anomalies are usually found more easily. Moreover, they apply multiple systems on one ASD task, which is lacking in generalizability. For tracking this, our method extracts patch-level features to learn unified audio representation that generalizes well and models local information that is beneficial to detecting anomalies under domain shifts by self-supervised representation learning and it further optimizes the informativeness of clip-level representations in finetuning. Concretely, the input spectrograms are randomly split into two patch-level subsets, and then they are fed into DP-MAE to predict each other. Meanwhile, the output of one path is also considered to be the predicted objective of the other path to perform regularization from the perspective of self-distillation. In fine-tuning stage, a linear classifier is applied on the features produced by the encoder to get a more compact representation of normal sound. Experiments on DCASE 2022 Challenge Task2 development dataset show the effectiveness of our method.
What problem does this paper attempt to address?