Biodiversity Image Quality Metadata Augments Convolutional Neural Network Classification of Fish Species

Jeremy Leipzig,Yasin Bakis,Xiaojun Wang,Mohannad Elhamod,Kelly Diamond,Wasila Dahdul,Anuj Karpatne,Murat Maga,Paula Mabee,Henry L. Bart,Jane Greenberg
DOI: https://doi.org/10.1007/978-3-030-71903-6_1
2021-01-01
Abstract:Biodiversity image repositories are crucial sources for training machine learning approaches to support biological research. Metadata about object (e.g. image) quality is a putatively important prerequisite to selecting samples for these experiments. This paper reports on a study demonstrating the importance of image quality metadata for a species classification experiment involving a corpus of 1935 fish specimen images which were annotated with 22 metadata quality properties. A small subset of high quality images produced an F1 accuracy of 0.41 compared to 0.35 for a taxonomically matched subset low quality images when used by a convolutional neural network approach to species identification. Using the full corpus of images revealed that image quality differed between correctly classified and misclassified images. We found anatomical feature visibility was the most important quality feature for classification accuracy. We suggest biodiversity image repositories consider adopting a minimal set of image quality metadata to support machine learning.
What problem does this paper attempt to address?