Exploring Pre-trained General-purpose Audio Representations for Heart Murmur Detection

Daisuke Niizumi,Daiki Takeuchi,Yasunori Ohishi,Noboru Harada,Kunio Kashino
2024-04-26
Abstract:To reduce the need for skilled clinicians in heart sound interpretation, recent studies on automating cardiac auscultation have explored deep learning approaches. However, despite the demands for large data for deep learning, the size of the heart sound datasets is limited, and no pre-trained model is available. On the contrary, many pre-trained models for general audio tasks are available as general-purpose audio representations. This study explores the potential of general-purpose audio representations pre-trained on large-scale datasets for transfer learning in heart murmur detection. Experiments on the CirCor DigiScope heart sound dataset show that the recent self-supervised learning Masked Modeling Duo (M2D) outperforms previous methods with the results of a weighted accuracy of 0.832 and an unweighted average recall of 0.713. Experiments further confirm improved performance by ensembling M2D with other models. These results demonstrate the effectiveness of general-purpose audio representation in processing heart sounds and open the way for further applications. Our code is available online which runs on a 24 GB consumer GPU at
Audio and Speech Processing,Sound
What problem does this paper attempt to address?
The paper attempts to address the issue of reducing the reliance on skilled clinicians in heart sound analysis by improving the efficiency and accuracy of heart sound interpretation through automated cardiac auscultation. Specifically, the researchers explore how to use models pre-trained on large-scale general audio datasets for transfer learning to detect heart murmurs. Due to the small size of heart sound datasets, which are insufficient for training deep learning models, the researchers aim to improve heart murmur detection performance by leveraging existing large-scale general audio representations. The main contributions of the paper include: 1. Introducing general audio representations (such as Masked Modeling Duo, M2D) into the heart murmur detection task and demonstrating their effectiveness compared to existing methods. 2. Discovering that these general audio representations exhibit different performance trends when processing heart sounds, and that combining these models can further enhance overall performance. 3. Providing online-accessible code so that future researchers can replicate the experimental results. Through experiments, the researchers found that the M2D model significantly outperformed existing methods in terms of weighted accuracy (W.acc) and unweighted average recall (UAR), particularly excelling in detecting the "none" category of heart murmurs. Additionally, through model ensemble, the researchers further improved overall detection performance. These results indicate that general audio representations have great potential in handling heart sound tasks.