Abstract:Virus discovery by genomics and metagenomics empowered studies of viromes, facilitated characterization of pathogen epidemiology, and redefined our understanding of the natural genetic diversity of viruses with profound functional and structural implications. Here we employed a data-driven virus discovery approach that directly queries unprocessed sequencing data in a highly parallelized way and involves a targeted viral genome assembly strategy in a wide range of sequence similarity. By screening more than 269,000 datasets of numerous authors from the Sequence Read Archive and using two metrics that quantitatively assess assembly quality, we discovered 40 nidoviruses from six virus families whose members infect vertebrate hosts. They form 13 and 32 putative viral subfamilies and genera, respectively, and include 11 coronaviruses with bisegmented genomes from fishes and amphibians, a giant 36.1 kilobase coronavirus genome with a duplicated spike glycoprotein (S) gene, 11 tobaniviruses and 17 additional corona-, arteri-, cremega-, nanhypo- and nangoshaviruses. Genome segmentation emerged in a single evolutionary event in the monophyletic lineage encompassing the subfamily Pitovirinae . We recovered the bisegmented genome sequences of two coronaviruses from RNA samples of 69 infected fishes and validated the presence of poly(A) tails at both segments using 3'RACE PCR and subsequent Sanger sequencing. We report a genetic linkage between accessory and structural proteins whose phylogenetic relationships and evolutionary distances are incongruent with the phylogeny of replicase proteins. We rationalize these observations in a model of inter-family S recombination involving at least five ancestral corona- and tobaniviruses of aquatic hosts. In support of this model, we describe an individual fish co-infected with members from the families Coronaviridae and Tobaniviridae . Our results expand the scale of the known extraordinary evolutionary plasticity in nidoviral genome architecture and call for revisiting fundamentals of genome expression, virus particle biology, host range and ecology of vertebrate nidoviruses. Research in virology is primarily motivated by human pathogens, such as SARS-CoV-2 in the case of the family Coronaviridae in the order Nidovirales . Studies of these and few model viruses describe virus-host interactions on the molecular level and are essential for developing virus control measures, but they must accommodate a vast range of viral natural diversity to allow generalizations. Here, we redefine our understanding of the genetic and genomic diversity in corona- and other nidoviruses of poorly sampled hosts. We mine more than 269,000 publicly accessible raw sequencing datasets for the presence of viral sequences using high-performance computing and discover 40 nidoviruses including 13 coronaviruses from a wide range of vertebrates. Some of the novel viruses from aquatic hosts have extraordinary features such as segmented genomes and recombinant genes coding for structural proteins. Our study suggests that gene exchange between diverse nidovirus species from different virus families might be more frequent than previously thought and can result in abrupt genomic innovations that in turn might facilitate host jumps even across vertebrate class borders. The growing list of newly discovered (corona)viruses enables an evolutionary perspective across virus divergency scales in different hosts on the wet lab-acquired knowledge about few viruses.

Petabase-scale sequence alignment catalyses viral discovery

De-heterogeneity of the eukaryotic viral reference database (EVRD) improves the accuracy and efficiency of viromic analysis

Microseek: A Protein-Based Metagenomic Pipeline for Virus Diagnostic and Discovery

Viral Discovery and Sequence Recovery Using DNA Microarrays

Accurate viral genome reconstruction and host assignment with proximity-ligation sequencing

VirID: Beyond Virus Discovery - An Integrated Platform for Comprehensive RNA Virus Characterization

Efficient and accurate detection of viral sequences at single-cell resolution reveals putative novel viruses perturbing host gene expression

Targeted Virome Sequencing Enhances Unbiased Detection and Genome Assembly of Known and Emerging Viruses—The Example of SARS-CoV-2

Bioinformatics Goes Viral: I. Databases, Phylogenetics and Phylodynamics Tools for Boosting Virus Research

Deep mining of the Sequence Read Archive reveals major genetic innovations in coronaviruses and other nidoviruses of aquatic vertebrates

Uncovering hundreds of exogenous and endogenous RNA viral RdRp sequences amongst uncharacterised sequences in public protein databases

Using artificial intelligence to document the hidden RNA virosphere

Widespread Horizontal Gene Transfer Among Animal Viruses

AltaiR: a C toolkit for alignment-free and temporal analysis of multi-FASTA data

ViralVectors: Compact and Scalable Alignment-free Virome Feature Generation

MrHAMER yields highly accurate single molecule viral sequences enabling analysis of intra-host evolution

Hidden Viral Sequences in Public Sequencing Data and Warning for Future Emerging Diseases

Unlocking the Viral Universe: Metagenomic Analysis of Bat Samples Using Next-Generation Sequencing

A program for real-time surveillance of SARS-CoV-2 genetics