Abstract:Automated invertebrate classification using computer vision has shown significant potential to improve specimen processing efficiency. However, challenges such as invertebrate diversity and morphological similarity among taxa can make it difficult to infer fine-scale taxonomic classifications using computer vision. As a result, many invertebrate computer vision models are forced to make classifications at coarser levels, such as at family or order. Here we propose a novel framework to combine computer vision and bulk DNA metabarcoding specimen processing pipelines to improve the accuracy and taxonomic granularity of individual specimen classifications. To improve specimen classification accuracy, our framework uses multimodal fusion models that combine image data with DNA-based assemblage data. To refine the taxonomic granularity of the models classifications, our framework cross-references the classifications with DNA metabarcoding detections from bulk samples. We demonstrated this framework using a continental-scale, invertebrate bycatch dataset collected by the National Ecological Observatory Network. The dataset included 17 taxa spanning three phyla (Annelida, Arthropoda, and Mollusca), with the finest starting taxonomic granularity of these taxa being order-level. Using this framework, we reached a classification accuracy of 79.6% across the 17 taxa using real DNA assemblage data, and 83.6% when the assemblage data was error-free, resulting in a 2.2% and 6.2% increase in accuracy when compared to a model trained using only images. After cross-referencing with the DNA metabarcoding detections, we improved taxonomic granularity in up to 72.2% of classifications, with up to 5.7% reaching species-level. By providing computer vision models with coincident DNA assemblage data, and refining individual classifications using DNA metabarcoding detections, our framework has the potential to greatly expand the capabilities of biological computer vision classifiers. This framework allows computer vision classifiers to infer taxonomically fine-grained classifications when it would otherwise be difficult or impossible due to challenges of morphologic similarity or data scarcity. This framework is not limited to terrestrial invertebrates and could be applied in any instance where image and DNA metabarcoding data are concurrently collected.

CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale

BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity

A hybrid approach to invertebrate biomonitoring using computer vision and DNA metabarcoding

BioCLIP: A Vision Foundation Model for the Tree of Life

CECS-CLIP: Fusing Domain Knowledge for Rare Wildlife Detection Model

Illuminating Entomological Dark Matter with DNA Barcodes in an Era of Insect Decline, Deep Learning, and Genomics

Classifying the unknown: Insect identification with deep hierarchical Bayesian learning

Deep Visual-Genetic Biometrics for Taxonomic Classification of Rare Species

Machine Learning Challenges of Biological Factors in Insect Image Data

A Step Towards Worldwide Biodiversity Assessment: The BIOSCAN-1M Insect Dataset

DiffCLIP: Few-shot Language-driven Multimodal Classifier

Iclip: Bridging Image Classification and Contrastive Language-Image Pre-Training for Visual Recognition

BarcodeBERT: Transformers for Biodiversity Analysis

Utilizing Geographical Distribution Statistical Data to Improve Zero-Shot Species Recognition

CLCLSA: Cross-omics Linked embedding with Contrastive Learning and Self Attention for multi-omics integration with incomplete multi-omics data

Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery

MedCLIP: Contrastive Learning from Unpaired Medical Images and Text

CLMB: deep contrastive learning for robust metagenomic binning

Identification of Species by Combining Molecular and Morphological Data Using Convolutional Neural Networks

Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data

CLCLSA: Cross-omics linked embedding with contrastive learning and self attention for integration with incomplete multi-omics data