Biodenoising: animal vocalization denoising without access to clean data

Marius Miron,Sara Keen,Jen-Yu Liu,Benjamin Hoffman,Masato Hagiwara,Olivier Pietquin,Felix Effenberger,Maddie Cusimano
2024-10-04
Abstract:Animal vocalization denoising is a task similar to human speech enhancement, a well-studied field of research. In contrast to the latter, it is applied to a higher diversity of sound production mechanisms and recording environments, and this higher diversity is a challenge for existing models. Adding to the challenge and in contrast to speech, we lack large and diverse datasets comprising clean vocalizations. As a solution we use as training data pseudo-clean targets, i.e. pre-denoised vocalizations, and segments of background noise without a vocalization. We propose a train set derived from bioacoustics datasets and repositories representing diverse species, acoustic environments, geographic regions. Additionally, we introduce a non-overlapping benchmark set comprising clean vocalizations from different taxa and noise samples. We show that that denoising models (demucs, CleanUNet) trained on pseudo-clean targets obtained with speech enhancement models achieve competitive results on the benchmarking set. We publish data, code, libraries, and demos <a class="link-external link-https" href="https://mariusmiron.com/research/biodenoising" rel="external noopener nofollow">this https URL</a>.
Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the denoising problem of animal vocalization, that is, in the absence of clean (noise - free) data, how to remove noise from recordings containing various background noises to extract clear animal calls. Specifically: 1. **Task Definition**: - The task of animal vocalization denoising is similar to human speech enhancement, but is applied to more diverse vocalization mechanisms and recording environments. - Unlike human speech, the lack of large - scale and diverse clean vocalization datasets is a major challenge. 2. **Research Motivation**: - Recordings of animal vocalization are widely used in the study of animal communication, behavior, and biodiversity monitoring. - However, the presence of background noise makes these recordings difficult to be used for automated measurement of acoustic variables or high - quality playback experiments. 3. **Main Challenges**: - The lack of clean vocalization datasets makes it impossible to directly use traditional supervised learning methods for training. - The diversity and complexity of animal vocalization increase the difficulty of model generalization. 4. **Solutions**: - It is proposed to use "pseudo - clean targets", that is, approximate clean vocalizations obtained by preliminarily denoising noisy vocalizations through a pre - trained speech enhancement model. - Use these pseudo - clean targets and pure noise segments as training data to retrain the denoising model. 5. **Contributions**: - Introduced a manually constructed benchmark dataset containing clean vocalizations and noise. - Proposed a large - scale training set generated based on existing bioacoustic datasets. - Verified the effectiveness of training with pseudo - clean targets, especially in the absence of clean data. Through these methods, the paper aims to develop a technique that can effectively denoise in the absence of clean vocalization data and verify its generalization ability in multiple species and environments.