Motif conservation, stability, and host gene expression are the main drivers of snoRNA expression across vertebrates

Étienne Fafard-Couture,Pierre-Étienne Jacques,Michelle S Scott
DOI: https://doi.org/10.1101/gr.277483.122
Abstract:Small nucleolar RNAs (snoRNAs) are structured noncoding RNAs present in multiple copies within eukaryotic genomes. snoRNAs guide chemical modifications on their target RNA and regulate processes like ribosome assembly and splicing. Most human snoRNAs are embedded within host gene introns, the remainder being independently expressed from intergenic regions. We recently characterized the abundance of snoRNAs and their host gene across several healthy human tissues and found that the level of most snoRNAs does not correlate with that of their host gene, with the observation that snoRNAs embedded within the same host gene often differ drastically in abundance. To better understand the determinants of snoRNA expression, we trained machine learning models to predict whether snoRNAs are expressed or not in human tissues based on more than 30 collected features related to snoRNAs and their genomic context. By interpreting the models' predictions, we find that snoRNAs rely on conserved motifs, a stable global structure and terminal stem, and a transcribed locus to be expressed. We observe that these features explain well the varying abundance of snoRNAs embedded within the same host gene. By predicting the expression status of snoRNAs across several vertebrates, we notice that only one-third of all annotated snoRNAs are expressed per genome, as in humans. Our results suggest that ancestral snoRNAs disseminated within vertebrate genomes, sometimes leading to the development of new functions and a probable gain in fitness and thereby conserving features favorable to the expression of these few snoRNAs, the large remainder often degenerating into pseudogenes.
What problem does this paper attempt to address?