Abstract:Background The mitochondrial genomes (mitogenomes) of metazoans generally include the same set of protein-coding genes, which ensures the homology of mitochondrial genes between species. The mitochondrial genes are often used as reference data for species identification based on genetic data (DNA barcoding). The need for such reference data has been increasing due to the application of environmental DNA (eDNA) analysis for environmental assessments. Recently, the number of publicly available sequence reads obtained with next-generation sequencing (NGS) has been increasing in the public database (the NCBI Sequence Read Archive, SRA). Such freely available NGS reads would be promising sources for assembling mitochondrial protein-coding genes (mPCGs) of organisms whose mitochondrial genes are not available in GenBank. The present study aimed to assemble annelid mPCGs from raw data deposited in the SRA. Methods The recent progress in the classification of Annelida was briefly introduced. In the present study, the mPCGs of 32 annelid species of 19 families in clitellates and allies in Sedentaria (echiurans and polychaetes) were newly assembled from the reads deposited in the SRA. Assembly was performed with a recently published pipeline mitoRNA, which includes cycles of Bowtie2 mapping and Trinity assembly. Assembled mPCGs were deposited in GenBank as Third Party Data (TPA) data. A phylogenetic tree was reconstructed with maximum likelihood (ML) analysis, together with other mPCGs deposited in GenBank. Results and Discussion mPCG assembly was largely successful except for Travisia forbesii; only four genes were detected from the assembled contigs of the species probably due to the reads targeting its parasite. Most genes were largely successfully obtained, whereas atp8, nad2, and nad4l were only successful in 22–24 species. The high nucleotide substitution rates of these genes might be relevant to the failure in the assembly although nad6, which showed a similarly high substitution rate, was successfully assembled. Although the phylogenetic positions of several lineages were not resolved in the present study, the phylogenetic relationships of some polychaetes and leeches that were not inferred by transcriptomes were well resolved probably due to a more dense taxon sampling than previous phylogenetic analyses based on transcriptomes. Although NGS data are generally better sources for resolving phylogenetic relationships of both higher and lower classifications, there are ensuring needs for specific loci of the mitochondrial genes for analyses that do not require high resolutions, such as DNA barcoding, eDNA, and phylogenetic analysis among lower taxa. Assembly from publicly available NGS reads would help design specific primers for the mitochondrial gene sequences of species, whose mitochondrial genes are hard to amplify by Sanger sequencing using universal primers.

A quantitative reference transcriptome for Nematostella vectensis early embryonic development: a pipeline for de novo assembly in emerging model systems

De novo assembly of transcriptomes and differential gene expression analysis using short-read data from emerging model organisms – a brief guide

Updated single cell reference atlas for the starlet anemone Nematostella vectensis

De Novo Assembly And Characterization Of Early Embryonic Transcriptome Of The Horseshoe Crab Tachypleus Tridentatus

A comprehensive human embryo reference tool using single-cell RNA-sequencing data

A Comprehensive Human Embryogenesis Reference Tool using Single-Cell RNA-Sequencing Data

Combining independent de novo assemblies optimizes the coding transcriptome for nonconventional model eukaryotic organisms

De Novo Assembly and Validation of Planaria Transcriptome by Massive Parallel Sequencing and Shotgun Proteomics.

Untwisting the Caenorhabditis Elegans Embryo

Revisiting genomes of non-model species with long reads yields new insights into their biology and evolution

Comprehensive RNA-Seq Analysis Pipeline for Non-Model Organisms and Its Application in Schmidtea mediterranea .

Improving transcriptome construction in non-model organisms: integrating manual and automated gene definition in Emiliania huxleyi

Ocean to Tree: Leveraging Single-Molecule RNA-Seq to Repair Genome Gene Models and Improve Phylogenomic Analysis of Gene and Species Evolution

The rise of the starlet sea anemone Nematostella vectensis as a model system to investigate development and regeneration

Soil Nematode Community Profiling Using Reference-Free Mito-Metagenomics

Integrate Heterogeneous NGS and TGS Data to Boost Genome-free Transcriptome Research

TrancriptomeReconstructoR: data-driven annotation of complex transcriptomes

TrAnnoScope: A Modular Snakemake Pipeline for Full-Length Transcriptome Analysis and Functional Annotation

Single-worm long-read sequencing reveals genome diversity in free-living nematodes

De Novo Assembly of Uca minax Transcriptome from Next Generation Sequencing

Buried treasure in a public repository: Mining mitochondrial genes of 32 annelid species from sequence reads deposited in the Sequence Read Archive (SRA)