Abstract:Automatic detection and classification of animal sounds has many applications in biodiversity monitoring and animal behaviour. In the past twenty years, the volume of digitised wildlife sound available has massively increased, and automatic classification through deep learning now shows strong results. However, bioacoustics is not a single task but a vast range of small-scale tasks (such as individual ID, call type, emotional indication) with wide variety in data characteristics, and most bioacoustic tasks do not come with strongly-labelled training data. The standard paradigm of supervised learning, focussed on a single large-scale dataset and/or a generic pre-trained algorithm, is insufficient. In this work we recast bioacoustic sound event detection within the AI framework of few-shot learning. We adapt this framework to sound event detection, such that a system can be given the annotated start/end times of as few as 5 events, and can then detect events in long-duration audio -- even when the sound category was not known at the time of algorithm training. We introduce a collection of open datasets designed to strongly test a system's ability to perform few-shot sound event detections, and we present the results of a public contest to address the task. We show that prototypical networks are a strong-performing method, when enhanced with adaptations for general characteristics of animal sounds. We demonstrate that widely-varying sound event durations are an important factor in performance, as well as non-stationarity, i.e. gradual changes in conditions throughout the duration of a recording. For fine-grained bioacoustic recognition tasks without massive annotated training data, our results demonstrate that few-shot sound event detection is a powerful new method, strongly outperforming traditional signal-processing detection methods in the fully automated scenario.

Rare Sound Event Detection Using Deep Learning and Data Augmentation

Robust Audio Sensing with Multi-Sound Classification.

A Comparison of deep learning methods for environmental sound

Transferring Voice Knowledge for Acoustic Event Detection: An Empirical Study

Investigation of Data Augmentation Techniques in Environmental Sound Recognition

Multi-scale Convolutional Recurrent Neural Network and Data Augmentation for Polyphonic Sound Event Detection

A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection

Multi-task deep learning approach for sound event recognition and tracking

An Experimental Study on Sound Event Localization and Detection under Realistic Testing Conditions

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

Multi-Scale Convolutional Recurrent Neural Network with Ensemble Method for Weakly Labeled Sound Event Detection

Sound Event Detection for Human Safety and Security in Noisy Environments

Learning to detect an animal sound from five examples

Robust sound event classification using deep neural networks

Balanced Deep CCA for Bird Vocalization Detection

Non-Negative Matrix Factorization-Convolutional Neural Network (NMF-CNN) For Sound Event Detection

Audio-Based Music Classification with DenseNet And Data Augmentation

Proposal-based Few-shot Sound Event Detection for Speech and Environmental Sounds with Perceivers

Real-Time Vehicle Sound Detection System Based on Depthwise Separable Convolution Neural Network and Spectrogram Augmentation

A CNN Sound Classification Mechanism Using Data Augmentation

Sound event localization and detection based on crnn using rectangular filters and channel rotation data augmentation