Few-Shot Bioacoustics Event Detection Using Transductive Inference with Data Augmentation

Farhad Banoori,Nouman Ijaz,Jinglun Shi,Khalid Khan,Xiongying Liu,Sadique Ahmad,Allam Jaya Prakash,Pawel Plawiak,Mohamed Hammad
DOI: https://doi.org/10.1109/lsens.2024.3363021
2024-01-01
IEEE Sensors Letters
Abstract:Few-shot (FS) sound event detection (SED) is the process of identifying and recognizing specific sounds or events within an audio recording, specifically in the field of bioacoustics. This task is especially challenging due to the substantial unpredictability, complexity of the sounds, and the limited amount of labeled training data available. We recommend a transudative learning technique with data augmentation in response to the recent success of transudative inference (TI) in the field of computer vision. It aims to make predictions about new, unseen data by maximizing the mutual information between features of the labeled training data and the features of the unlabeled test data. We use the detection and classification of acoustic scenes and events (DCASE)-2021 and DCASE-2022 official datasets from the DCASE to do FS bioacoustics event detection experiments. Research findings illustrate that the recommended approach using TI along with spectrogram augmentation improves the performance of the FS bioacoustics event detection system. The proposed method improves the F-score by 0.10 and 0.16 on DCASE-2021 and DCASE-2022, respectively, compared with the average baseline and state-of-the-art methods.
What problem does this paper attempt to address?