Auto deep learning for bioacoustic signals

Giulio Tosato,Abdelrahman Shehata,Joshua Janssen,Kees Kamp,Pramatya Jati,Dan Stowell
2023-12-26
Abstract:This study investigates the potential of automated deep learning to enhance the accuracy and efficiency of multi-class classification of bird vocalizations, compared against traditional manually-designed deep learning models. Using the Western Mediterranean Wetland Birds dataset, we investigated the use of AutoKeras, an automated machine learning framework, to automate neural architecture search and hyperparameter tuning. Comparative analysis validates our hypothesis that the AutoKeras-derived model consistently outperforms traditional models like MobileNet, ResNet50 and VGG16. Our approach and findings underscore the transformative potential of automated deep learning for advancing bioacoustics research and models. In fact, the automated techniques eliminate the need for manual feature engineering and model design while improving performance. This study illuminates best practices in sampling, evaluation and reporting to enhance reproducibility in this nascent field. All the code used is available at https: //github.com/giuliotosato/AutoKeras-bioacustic Keywords: AutoKeras; automated deep learning; audio classification; Wetlands Bird dataset; comparative analysis; bioacoustics; validation dataset; multi-class classification; spectrograms.
Machine Learning,Artificial Intelligence,Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the accuracy and efficiency of multi - class bird song classification through automatic deep - learning techniques, especially compared with traditionally manually - designed deep - learning models. Specifically, the research used the Western Mediterranean Wetland Birds dataset to explore the application of AutoKeras, an automated machine - learning framework, in neural network architecture search and hyperparameter tuning. ### Research Background - **Importance of Ecological Monitoring**: Changes in the composition of bird communities and the number of specific species can serve as reliable indicators of the overall health of an ecosystem. - **Limitations of Traditional Methods**: Bioacoustic research has historically relied on experts to manually identify bird species, but with the emergence of machine learning and large - scale audio datasets, automatic sound identification through deep learning has become feasible and popular. - **Challenges**: Building high - precision deep - learning models faces multiple challenges, including the need for a large amount of labeled data, high model complexity, and the need to optimize many architecture design choices (such as neural network topology, hyperparameters), etc. ### Research Objectives - **Advantages of Automatic Deep Learning**: The research aims to verify whether automated deep - learning methods (such as AutoKeras) can outperform traditional manually - designed models (such as MobileNet, ResNet50, and VGG16) and exhibit better performance in multi - class bird song classification tasks. - **Reducing Manual Feature Engineering**: Automation techniques eliminate the need for manual feature engineering while improving performance. - **Best Practices**: The research also provides best practices regarding sampling, evaluation, and reporting to enhance the reproducibility in this emerging field. ### Methods - **Dataset**: Use the Western Mediterranean Wetland Birds dataset, which contains the songs of 20 native bird species. - **Data Pre - processing**: Adopt a stratified sampling method to divide the dataset into training, validation, and test sets, ensuring that each category is appropriately represented in each set. - **Model Comparison**: Compare three pre - trained models (MobileNet V2, VGG16, ResNet50) with the optimal model obtained through AutoKeras search. ### Results - **Superiority of the AutoKeras Model**: The Xception model obtained through AutoKeras search outperforms the three baseline models on the validation and test sets. - **Confusion Matrix Analysis**: All models have misclassification phenomena for certain categories, indicating that these categories may have similar characteristics or lower data quality. ### Discussion - **Importance of Data Pre - processing**: The stratified sampling strategy takes into account the influence of different session lengths and significantly improves the generalization ability and evaluation accuracy of the model. - **Necessity of Model Generalization**: Emphasize the importance of using an independent test set to verify the generalization ability of the model outside the training and validation data. ### Conclusions - **Potential of Automatic Deep Learning**: The research shows that the deep - learning method automated by AutoKeras performs well in multi - class bird song classification tasks, reducing the need for manual design and optimization of neural network models. - **Crucial Role of Data Pre - processing**: A reasonable data pre - processing strategy is crucial for handling multi - class imbalanced datasets and can significantly improve model performance. - **Transparency and Reproducibility**: It is recommended to record the sampling method in detail in the research report and provide complete confusion matrix data to enhance the transparency and reproducibility of the research. This research not only demonstrates the potential of automatic deep learning in bioacoustic research but also provides important methodological guidance for future research.