Generalization in birdsong classification: impact of transfer learning methods and dataset characteristics

Burooj Ghani,Vincent J. Kalkman,Bob Planqué,Willem-Pier Vellinga,Lisa Gill,Dan Stowell
2024-09-21
Abstract:Animal sounds can be recognised automatically by machine learning, and this has an important role to play in biodiversity monitoring. Yet despite increasingly impressive capabilities, bioacoustic species classifiers still exhibit imbalanced performance across species and habitats, especially in complex soundscapes. In this study, we explore the effectiveness of transfer learning in large-scale bird sound classification across various conditions, including single- and multi-label scenarios, and across different model architectures such as CNNs and Transformers. Our experiments demonstrate that both fine-tuning and knowledge distillation yield strong performance, with cross-distillation proving particularly effective in improving in-domain performance on Xeno-canto data. However, when generalizing to soundscapes, shallow fine-tuning exhibits superior performance compared to knowledge distillation, highlighting its robustness and constrained nature. Our study further investigates how to use multi-species labels, in cases where these are present but incomplete. We advocate for more comprehensive labeling practices within the animal sound community, including annotating background species and providing temporal details, to enhance the training of robust bird sound classifiers. These findings provide insights into the optimal reuse of pretrained models for advancing automatic bioacoustic recognition.
Sound,Machine Learning,Audio and Speech Processing
What problem does this paper attempt to address?
The paper attempts to address the problem of how to effectively utilize transfer learning methods to improve model performance in bird call classification within complex soundscapes. Specifically, the study explores the following points: 1. **Effectiveness of Transfer Learning Methods**: The study evaluates the effects of transfer learning methods such as fine-tuning and knowledge distillation in both single-label and multi-label scenarios, as well as under different model architectures (e.g., CNN and Transformer). 2. **Impact of Dataset Characteristics**: The study considers the characteristics of the Xeno-canto dataset and their impact on model training, particularly discussing the cases of rare species and incomplete annotations of background sounds. 3. **Utilization of Multi-Species Labels**: In some cases, although multi-species labels exist, they are not fully annotated. The study suggests promoting more comprehensive annotation practices within the animal sound community, including annotating background species and providing temporal details, to enhance the robustness of bird call classifiers. In summary, the paper aims to improve the accuracy and generalization ability of automatic bioacoustic recognition technology in complex environments by optimizing transfer learning strategies and improving dataset annotation methods.