A comprehensive evaluation of taxonomic classifiers in marine vertebrate eDNA studies

Philipp E. Bayer,Adam Bennett,Georgia Nester,Shannon Corrigan,Eric J. Raes,Allison S. McInnes,Madalyn Cooper,Marcelle E. Ayad,Philip McVey,Anya Kardailsky,Jessica Pearce,Matthew W. Fraser,Priscila Goncalves,Stephen Burnell,Sebastian Rauschert
DOI: https://doi.org/10.1101/2024.02.15.580601
2024-02-17
Abstract:Environmental DNA (eDNA) metabarcoding is a widely used tool for surveying marine vertebrate biodiversity. To this end, many computational tools have been released and a plethora of bioinformatic approaches are used for eDNA-based community composition analysis. Simulation studies and careful evaluation of taxonomic classifiers are essential to establish reliable benchmarks to improve accuracy and reproducibility of eDNA-based findings. Here we present a comprehensive evaluation of nine taxonomic classifiers exploring three widely used mitochondrial markers (12S rDNA, 16S rDNA, and COI) in Australian marine vertebrates. Curated reference databases and exclusion database tests were used to simulate diverse species compositions, including three positive control and two negative control datasets. Using these simulated datasets, we were able to identify between 19% to 85% of marine vertebrate species using mitochondrial markers. We show that MMSeqs2 and Metabuli generally outperform BLAST with 10% and 11% higher F1 scores for 12S and 16S rDNA markers, respectively, and that Naive Bayes Classifiers such as Mothur outperform sequence-based classifiers except MMSeqs2 for COI markers by 11%. Database exclusion tests reveal that MMSeqs2 and BLAST are less susceptible to false positives compared to Kraken2 with default parameters. Based on these findings, we recommend that MMSeqs2 is used for taxonomic classification of marine vertebrates given its ability to improve species-level assignments while reducing the number of false positives. Our work contributes to the establishment of best practices in eDNA-based biodiversity analysis to ultimately increase the reliability of this monitoring tool in the context of marine vertebrate conservation.
Ecology
What problem does this paper attempt to address?