Deep Neural Network Identification of Limnonectes Species and New Class Detection Using Image Data

Li Xu,Yili Hong,Eric P. Smith,David S. McLeod,Xinwei Deng,Laura J. Freeman
2023-11-15
Abstract:As is true of many complex tasks, the work of discovering, describing, and understanding the diversity of life on Earth (viz., biological systematics and taxonomy) requires many tools. Some of this work can be accomplished as it has been done in the past, but some aspects present us with challenges which traditional knowledge and tools cannot adequately resolve. One such challenge is presented by species complexes in which the morphological similarities among the group members make it difficult to reliably identify known species and detect new ones. We address this challenge by developing new tools using the principles of machine learning to resolve two specific questions related to species complexes. The first question is formulated as a classification problem in statistics and machine learning and the second question is an out-of-distribution (OOD) detection problem. We apply these tools to a species complex comprising Southeast Asian stream frogs (Limnonectes kuhlii complex) and employ a morphological character (hind limb skin texture) traditionally treated qualitatively in a quantitative and objective manner. We demonstrate that deep neural networks can successfully automate the classification of an image into a known species group for which it has been trained. We further demonstrate that the algorithm can successfully classify an image into a new class if the image does not belong to the existing classes. Additionally, we use the larger MNIST dataset to test the performance of our OOD detection algorithm. We finish our paper with some concluding remarks regarding the application of these methods to species complexes and our efforts to document true biodiversity. This paper has online supplementary materials.
Machine Learning,Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address two key issues in biosystematics and taxonomy, particularly for species complexes with high morphological similarity: 1. **Classification of Known Species**: - This problem is formalized as an image classification problem in statistics and machine learning. Researchers attempt to develop a method based on deep neural networks to automatically classify images into known species groups using the hind limb skin texture of frogs as a morphological feature. 2. **Detection of New Species**: - This problem is formalized as an Out-of-Distribution (OOD) detection problem. Researchers aim to develop a tool that can identify images that do not belong to any existing known categories, thereby detecting potential new species. ### Background and Motivation - **Biodiversity Crisis**: Currently, the species we have discovered, identified, and described represent only a small fraction of all life on Earth. Due to human factors such as habitat loss and climate change, biodiversity is disappearing at a rate faster than we can recognize it. - **Limitations of Traditional Methods**: Traditional species identification relies on the expertise of taxonomists, who identify new species through an understanding of morphology and natural history. However, for species complexes with high morphological similarity, traditional methods struggle to accurately distinguish known species and detect new ones. - **Application of New Technologies**: In recent years, the application of new tools such as DNA sequencing and phylogenetic analysis has accelerated the process of species discovery but has also revealed the issue of true diversity masked by morphological similarity. Deep learning and machine learning technologies offer new possibilities for addressing these issues. ### Research Subjects - **Research Subjects**: The Southeast Asian stream frogs (Limnonectes kuhlii complex), a group that includes multiple genetically distinct but morphologically similar species. - **Specific Features**: The texture of the hind limb skin of frogs (a pattern formed by raised nodules) is a useful morphological feature that can be used to distinguish different species. Traditionally, this feature has been qualitatively described as "smooth to rough" or "dense to sparse," but lacks quantitative standards. ### Research Methods 1. **Image Classification**: - Using a Convolutional Neural Network (CNN) model, the researchers trained on an image dataset to achieve automatic classification of known species. - Image preprocessing steps included cropping images of frog hind limbs, removing background noise, adjusting background color, etc., to ensure image consistency. 2. **Out-of-Distribution Detection**: - Using Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) methods, the researchers detected new species through the output features of intermediate layers. - By calculating the Mahalanobis distance to measure the distance of new samples from known categories and using a logistic regression model to determine classification thresholds. ### Experimental Results - **Classification Accuracy**: In five-fold cross-validation, the overall classification accuracy of the model was about 73.1%. Considering the small sample size of the training set, this result is reasonable. - **Out-of-Distribution Detection**: The effectiveness of the OOD detection method was further validated using the MNIST dataset, showing that the QDA method had better classification ability in multi-layer outputs. ### Conclusion - This study demonstrates the potential of deep learning and machine learning technologies in biological classification and new species detection, especially when dealing with species complexes with high morphological similarity. - Future research can further optimize the model to improve classification and detection accuracy, providing strong support for biodiversity conservation.