Abstract:Passive acoustic monitoring (PAM) allows for the study of vocal animals on temporal and spatial scales difficult to achieve using only human observers. Recent improvements in recording technology, data storage, and battery capacity have led to increased use of PAM. One of the main obstacles in implementing wide-scale PAM programs is the lack of open-source programs that efficiently process terabytes of sound recordings and do not require large amounts of training data. Here we describe a workflow for detecting, classifying, and visualizing female Northern grey gibbon calls in Sabah, Malaysia. Our approach detects sound events using band-limited energy summation and does binary classification of these events (gibbon female or not) using machine learning algorithms (support vector machine and random forest). We then applied an unsupervised approach (affinity propagation clustering) to see if we could further differentiate between true and false positives or the number of gibbon females in our dataset. We used this workflow to address three questions: (1) does this automated approach provide reliable estimates of temporal patterns of gibbon calling activity; (2) can unsupervised approaches be applied as a post-processing step to improve the performance of the system; and (3) can unsupervised approaches be used to estimate how many female individuals (or clusters) there are in our study area? We found that performance plateaued with >160 clips of training data for each of our two classes. Using optimized settings, our automated approach achieved a satisfactory performance (F1 score ~ 80%). The unsupervised approach did not effectively differentiate between true and false positives or return clusters that appear to correspond to the number of females in our study area. Our results indicate that more work needs to be done before unsupervised approaches can be reliably used to estimate the number of individual animals occupying an area from PAM data. Future work applying these methods across sites and different gibbon species and comparisons to deep learning approaches will be crucial for future gibbon conservation initiatives across Southeast Asia.

Detection and classification of vocal productions in large scale audio recordings

Automatic Sound Event Detection and Classification of Great Ape Calls Using Neural Networks

Automated Call Detection for Acoustic Surveys with Structured Calls of Varying Length

Utilizing DeepSqueak for automatic detection and classification of mammalian vocalizations: a case study on primate vocalizations

Introducing a Central African Primate Vocalisation Dataset for Automated Species Classification

Automated detection of Bornean white-bearded gibbon (Hylobates albibarbis) vocalisations using an open-source framework for deep learning

Automated detection of Bornean white-bearded gibbon (Hylobates albibarbis) vocalizations using an open-source framework for deep learning

An open-source voice type classifier for child-centered daylong recordings

InfantNet: A Deep Neural Network for Analyzing Infant Vocalizations

Automatic detection and classification of marmoset vocalizations using deep and recurrent neural networks.

Learning to rumble: Automated elephant call classification, detection and endpointing using deep architectures

Applying Deep Neural Network to Automatic Recognition of Giant Panda Vocalizations

Discrimination between the facial gestures of vocalising and non-vocalising lemurs and small apes using deep learning

Automatic Recognition of Giant Panda Vocalizations Using Wide Spectrum Features and Deep Neural Network.

Improving Primate Sounds Classification using Binary Presorting for Deep Learning

Automated detection of gibbon calls from passive acoustic monitoring data using convolutional neural networks in the "torch for R" ecosystem

Voice Analysis in Dogs with Deep Learning: Development of a Fully Automatic Voice Analysis System for Bioacoustics Studies

A workflow for the automated detection and classification of female gibbon calls from long-term acoustic recordings

Scalable Data Annotation Pipeline for High-Quality Large Speech Datasets Development

Applying machine learning to primate bioacoustics: Review and perspectives

Research On Singing Voice Detection Based On A Long-Term Recurrent Convolutional Network With Vocal Separation And Temporal Smoothing