Skimming for barcodes: rapid production of mitochondrial genome and nuclear ribosomal repeat reference markers through shallow shotgun sequencing

Mykle L. Hoban,Jonathan Whitney,Allen G. Collins,Christopher Meyer,Katherine R. Murphy,Abigail J. Reft,Katherine E. Bemis
DOI: https://doi.org/10.7717/peerj.13790
2022-08-06
Abstract:DNA barcoding is critical to conservation and biodiversity research, yet public reference databases are incomplete. Existing barcode databases are biased toward cytochrome oxidase subunit I (COI) and frequently lack associated voucher specimens or geospatial metadata, which can hinder reliable species assignments. The emergence of metabarcoding approaches such as environmental DNA (eDNA) has necessitated multiple marker techniques combined with barcode reference databases backed by voucher specimens. Reference barcodes have traditionally been generated by Sanger sequencing, however sequencing multiple markers is costly for large numbers of specimens, requires multiple separate PCR reactions, and limits resulting sequences to targeted regions. High-throughput sequencing techniques such as genome skimming enable assembly of complete mitogenomes, which contain the most commonly used barcoding loci (e.g., COI, 12S, 16S), as well as nuclear ribosomal repeat regions (e.g., ITS1&2, 18S). We evaluated the feasibility of genome skimming to generate barcode references databases for marine fishes by assembling complete mitogenomes and nuclear ribosomal repeats. We tested genome skimming across a taxonomically diverse selection of 12 marine fish species from the collections of the National Museum of Natural History, Smithsonian Institution. We generated two sequencing libraries per species to test the impact of shearing method (enzymatic or mechanical), extraction method (kit-based or automated), and input DNA concentration. We produced complete mitogenomes for all non-chondrichthyans (11/12 species) and assembled nuclear ribosomal repeats (18S-ITS1-5.8S-ITS2-28S) for all taxa. The quality and completeness of mitogenome assemblies was not impacted by shearing method, extraction method or input DNA concentration. Our results reaffirm that genome skimming is an efficient and (at scale) cost-effective method to generate all mitochondrial and common nuclear DNA barcoding loci for multiple species simultaneously, which has great potential to scale for future projects and facilitate completing barcode reference databases for marine fishes.
What problem does this paper attempt to address?