Abstract:Detecting the presence of animal vocalisations in nature is essential to study animal populations and their behaviors. A recent development in the field is the introduction of the task known as few-shot bioacoustic sound event detection, which aims to train a versatile animal sound detector using only a small set of audio samples. Previous efforts in this area have utilized different architectures and data augmentation techniques to enhance model performance. However, these approaches have not fully bridged the domain gap between source and target distributions, limiting their applicability in real-world scenarios. In this work, we introduce an new dataset designed to augment the diversity and breadth of classes available for few-shot bioacoustic event detection, building on the foundations of our previous datasets. To establish a robust baseline system tailored for the DCASE 2024 Task 5 challenge, we delve into an array of acoustic features and adopt negative hard sampling as our primary domain adaptation strategy. This approach, chosen in alignment with the challenge's guidelines that necessitate the independent treatment of each audio file, sidesteps the use of transductive learning to ensure compliance while aiming to enhance the system's adaptability to domain shifts. Our experiments show that the proposed baseline system achieves a better performance compared with the vanilla prototypical network. The findings also confirm the effectiveness of each domain adaptation method by ablating different components within the networks. This highlights the potential to improve few-shot bioacoustic sound event detection by further reducing the impact of domain shift.

Learning generic feature representation with synthetic data for weakly-supervised sound event detection by inter-frame distance loss

Frame Pairwise Distance Loss for Weakly-supervised Sound Event Detection

Joint framework with deep feature distillation and adaptive focal loss for weakly supervised audio tagging and acoustic event detection

Domestic sound event detection by shift consistency mean-teacher training and adversarial domain adaptation

Weakly and semi-supervised learning for sound event detection using image pretrained convolutional recurrent neural network, weighted pooling and mean teacher method

A robust audio deepfake detection system via multi-view feature

Contrastive Loss Based Frame-wise Feature disentanglement for Polyphonic Sound Event Detection

Semi-supervised Sound Event Detection with Local and Global Consistency Regularization

A scene-dependent sound event detection approach using multi-task learning

Multi-Scale Convolutional Recurrent Neural Network with Ensemble Method for Weakly Labeled Sound Event Detection

Training Sound Event Detection On A Heterogeneous Dataset

Multitask frame-level learning for few-shot sound event detection

Learning A Self-Supervised Domain-Invariant Feature Representation for Generalized Audio Deepfake Detection

Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation

FMSG-JLESS Submission for DCASE 2024 Task4 on Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels

Mixstyle based Domain Generalization for Sound Event Detection with Heterogeneous Training Data

Sound Event Detection in Synthetic Domestic Environments

A Multi-Task Learning Framework for Sound Event Detection using High-level Acoustic Characteristics of Sounds

Mind the Domain Gap: a Systematic Analysis on Bioacoustic Sound Event Detection

Weakly supervised CRNN system for sound event detection with large-scale unlabeled in-domain data

A Joint Detection-Classification Model for Weakly Supervised Sound Event Detection Using Multi-Scale Attention Method