Abstract:Soundscape ecologists aim to study the acoustic characteristics of an area that reflects natural processes [Schafer, 1977]. These sounds can be interpreted as biological (biophony), geophysical (geophony), and human-produced (anthrophony) [Pijanowski et al., 2011]. A common task is to use sounds to identify species based on the frequency content of a given signal. This signal can be further converted into spectrograms enabling other types of analysis to automate the identification of species. Based on the promising results of deep learning methods, such as Convolution Neural Networks (CNNs) in image classification, here we propose the use of a pre-trained VGG16 CNN architecture to identify two nocturnal avian species, namely Antrostomus rufus and Megascops choliba, commonly encountered in Brazilian forests. Monitoring the abundance of these species is important to ecologists to develop conservation programmes, detect environmental disturbances and assess the impact of human action. Specialists recorded sounds in 16-bit wave files at a sampling rate of 44Hz and classified the presence of these species. With the classified wave files, we created additional classes to visualise the performance of the VGG16 CNN architecture for detecting both species. We end up with six categories containing 60 seconds of audio of species vocalisation combinations and background only sounds. We produced spectrograms using the information from each RGB channel, only one channel (grey-scale), and applied the histogram equalisation technique to the grey-scale images. A comparison of the system performance using histogram equalised images and unmodified images was made. Histogram equalisation improves the contrast, and so the visibility to the human observer. Investigating the effect of histogram equalisation on the performance of the CNN was a feature of this study. Moreover, to show the practical application of our work, we created 51 minutes of audio, which contains more noise than the presence of both species (a scenario commonly encountered in field surveys). Our results showed that the trained VGG16 CNN produced, after 8000 epochs, a training accuracy of 100% for the three approaches. The test accuracy was 80.64%, 75.26%, and 67.74% for the RGB, grey-scaled, and histogram equalised approaches. The method’s accuracy on the synthetic audio file of 51 minutes was 92.15%. This accuracy level reveals the potential of CNN architectures in automating species detection and identification by sound using passive monitoring. Our results suggest that using coloured images to represent the spectrogram better generalises the classification than grey-scale and histogram equalised images. This study might develop future avian monitoring programmes based on passive sound recording, which significantly enhances sampling size without increasing cost.

Classification of animal sounds in a hyperdiverse rainforest using Convolutional Neural Networks

Towards small and accurate convolutional neural networks for acoustic biodiversity monitoring

Automated detection of gunshots in tropical forests using convolutional neural networks

Automated Call Detection for Acoustic Surveys with Structured Calls of Varying Length

A machine vision system for avian song classification with CNN’s

Automated detection of gibbon calls from passive acoustic monitoring data using convolutional neural networks in the "torch for R" ecosystem

Acoustic data augmentation for small passive acoustic monitoring datasets

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

Data augmentation approaches for improving animal audio classification

A CNN Sound Classification Mechanism Using Data Augmentation

Advanced Framework for Animal Sound Classification With Features Optimization

Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics

Unlocking the soundscape of coral reefs with artificial intelligence: pretrained networks and unsupervised learning win out

Generalization in birdsong classification: impact of transfer learning methods and dataset characteristics

Detecting bird sound in unknown acoustic background using crowdsourced training data

Marine Mammal Species Classification Using Convolutional Neural Networks and a Novel Acoustic Representation

Automated detection of Bornean white-bearded gibbon (Hylobates albibarbis) vocalizations using an open-source framework for deep learning

Global birdsong embeddings enable superior transfer learning for bioacoustic classification

Using Deep Learning to Classify Environmental Sounds in the Habitat of Western Black-Crested Gibbons

Automatic Detection and Compression for Passive Acoustic Monitoring of the African Forest Elephant

Efficient Speech Detection in Environmental Audio Using Acoustic Recognition and Knowledge Distillation