Abstract:Passive acoustic monitoring – an approach that utilizes autonomous acoustic recording units – allows for non-invasive monitoring of individuals, assuming that it is possible to acoustically distinguish individuals. However, identifying effective analytical approaches for individual identification remains a challenge. Our study investigates how the use of different feature representations impacts our ability to distinguish between individual female Northern grey gibbons ( Hylobates funereus ). We broadcast pre-recorded calls from twelve gibbon females and re-recorded the calls at varying distances (directly under the tree to ~400 m away) using autonomous recording units. We evaluated the effectiveness of using different automated feature extraction approaches to classify gibbon calls: Mel-frequency cepstral coefficients (MFCCs), embeddings from three pre-trained neural networks (BirdNET, VGGish, and Wav2Vec2), and four commonly used acoustic indices. We used a supervised classification approach (random forest) to classify calls to the respective female and compared two unsupervised clustering approaches (affinity propagation clustering and hierarchical density-based spatial clustering) to evaluate which features were most effective for distinguishing female calls without using class labels. We used MFCCs as a baseline as previous work has shown they can be used to distinguish high-quality calls of individual gibbon females. Human annotators could only identify calls in spectrograms from recordings 10 dB), while the remaining features did not perform well. Contrary to our expectations, we found that MFCCs outperformed all other features for the unsupervised clustering tasks at closer distances and none of the features performed well at farther distances. The ability to acoustically discriminate animals under noisy conditions and from low signal-to-noise ratio calls has important implications for monitoring populations of endangered animals, such as gibbons. Focusing only on high signal-to-noise ratio calls for individual discrimination may not be possible for rare sounds, and future work should focus on developing effective approaches of feature extraction that can perform well across noisy, real-world conditions with a limited number of training samples.

Exploring bat song syllable representations in self-supervised audio encoders

Global birdsong embeddings enable superior transfer learning for bioacoustic classification

Machine learning methods for reconstructing the acoustic fields of bat biosonar

The pale spear‐nosed bat: A neuromolecular and transgenic model for vocal learning

Generalization in birdsong classification: impact of transfer learning methods and dataset characteristics

Transferable Models for Bioacoustics with Human Language Supervision

Automated Call Detection for Acoustic Surveys with Structured Calls of Varying Length

A Portable Terminal for Acoustic Monitoring and Online Recognition of Bats with CNN-LSTM

Bird song comparison using deep learning trained from avian perceptual judgments

An automatic analysis of ultrasound vocalisations for the prediction of interaction context in captive Egyptian fruit bats

Role of auditory feedback for vocal production learning in the Egyptian fruit bat

Bat2Web: A Framework for Real-Time Classification of Bat Species Echolocation Signals Using Audio Sensor Data

Mel-frequency cepstral coefficients outperform embeddings from pre-trained convolutional neural networks under noisy conditions for discrimination tasks of individual gibbons

An Initial study on Birdsong Re-synthesis Using Neural Vocoders

Vocal production learning in the pale spear-nosed bat, Phyllostomus discolor

animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacoustics

Adaptive Echolocation and Flight Behaviors in Bats Can Inspire Technology Innovations for Sonar Tracking and Interception

AVN: A Deep Learning Approach for the Analysis of Birdsong

Ultrasonic Spatial Target Localization Using Artificial Pinnae of Brown Long-eared Bat

Unsupervised Auditory and Semantic Entrainment Models with Deep Neural Networks

Special Acoustical Role of Pinna Simplifying Spatial Target Localization by the Brown Long-Eared Bat Plecotus Auritus