Acoustic data augmentation for small passive acoustic monitoring datasets

Aime Nshimiyimana
DOI: https://doi.org/10.1007/s11042-023-17959-2
IF: 2.577
2024-01-13
Multimedia Tools and Applications
Abstract:Training complex deep neural networks can result in overfitting when the networks are trained from random weight initialization on small datasets. Augmentation helps to reduce the negative effects of overfitting. The findings in computer vision and audio recognition research reveals that the performance of machine learning classifiers is significantly improved when augmentation is used. In the context of ecology, researchers conduct field surveys whereby microphones are placed in some location and audio data is recorded over a period of time. There is however no guarantee that the particular species of interest in the field survey will vocalize frequently near the microphone. Thus, the amount of data captured for the species of interest might be limited, and it may then be the source of overfitting. The main contribution of this paper is in performing experiments with time and frequency masking, and noise addition augmentation techniques in training a visual convolutional neural networks (CNN) repurposed for pattern recognition in acoustic spectrograms. These techniques increased the audio examples for the pin-tailed whydah and the Cape robin-chat to create a robust audio vocalization classifiers. To evaluate the performance of the augmentation techniques we conducted a comparison between experiments run with and without augmentation. We chose to use CNN as our classifier given that they are state-of-the-art in audio recognition tasks and they have revealed good performance. In the used augmentation techniques; time masking achieved 90.2% as the highest testing accuracy while pink noise is the most successful best classifier.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?