Abstract:Soundscape ecologists aim to study the acoustic characteristics of an area that reflects natural processes [Schafer, 1977]. These sounds can be interpreted as biological (biophony), geophysical (geophony), and human-produced (anthrophony) [Pijanowski et al., 2011]. A common task is to use sounds to identify species based on the frequency content of a given signal. This signal can be further converted into spectrograms enabling other types of analysis to automate the identification of species. Based on the promising results of deep learning methods, such as Convolution Neural Networks (CNNs) in image classification, here we propose the use of a pre-trained VGG16 CNN architecture to identify two nocturnal avian species, namely Antrostomus rufus and Megascops choliba, commonly encountered in Brazilian forests. Monitoring the abundance of these species is important to ecologists to develop conservation programmes, detect environmental disturbances and assess the impact of human action. Specialists recorded sounds in 16-bit wave files at a sampling rate of 44Hz and classified the presence of these species. With the classified wave files, we created additional classes to visualise the performance of the VGG16 CNN architecture for detecting both species. We end up with six categories containing 60 seconds of audio of species vocalisation combinations and background only sounds. We produced spectrograms using the information from each RGB channel, only one channel (grey-scale), and applied the histogram equalisation technique to the grey-scale images. A comparison of the system performance using histogram equalised images and unmodified images was made. Histogram equalisation improves the contrast, and so the visibility to the human observer. Investigating the effect of histogram equalisation on the performance of the CNN was a feature of this study. Moreover, to show the practical application of our work, we created 51 minutes of audio, which contains more noise than the presence of both species (a scenario commonly encountered in field surveys). Our results showed that the trained VGG16 CNN produced, after 8000 epochs, a training accuracy of 100% for the three approaches. The test accuracy was 80.64%, 75.26%, and 67.74% for the RGB, grey-scaled, and histogram equalised approaches. The method’s accuracy on the synthetic audio file of 51 minutes was 92.15%. This accuracy level reveals the potential of CNN architectures in automating species detection and identification by sound using passive monitoring. Our results suggest that using coloured images to represent the spectrogram better generalises the classification than grey-scale and histogram equalised images. This study might develop future avian monitoring programmes based on passive sound recording, which significantly enhances sampling size without increasing cost.

Generalization in birdsong classification: impact of transfer learning methods and dataset characteristics

Global birdsong embeddings enable superior transfer learning for bioacoustic classification

Acoustic Classification of Bird Species Using an Early Fusion of Deep Features

Multi-label classification for acoustic bird species detection using transfer learning approach

Unsupervised classification to improve the quality of a bird song recording dataset

Recognition of bird species with birdsong records using machine learning methods

Classification of birdsong spectrograms based on DR-ACGAN and dynamic convolution

Improving Bird Classification with Unsupervised Sound Separation

Advanced Framework for Animal Sound Classification With Features Optimization

Classification of animal sounds in a hyperdiverse rainforest using Convolutional Neural Networks

Recognizing bird species in diverse soundscapes under weak supervision

Automatic acoustic detection of birds through deep learning: the first Bird Audio Detection challenge

Large-Scale Whale-Call Classification by Transfer Learning on Multi-Scale Waveforms and Time-Frequency Features

A review of automatic recognition technology for bird vocalizations in the deep learning era

Multi-view features fusion for birdsong classification

Active Learning for Bird Sounds Classification

Investigation of Bird Sound Transformer Modeling and Recognition

Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics

A machine vision system for avian song classification with CNN’s

Transfer Learning with Semi-Supervised Dataset Annotation for Birdcall Classification

Transferable Models for Bioacoustics with Human Language Supervision