STRUCT: a statistical approach to identify RNA secondary structures from raw sequencing data, bypassing multiple sequence alignment

Julie Fangran Wang,Arjun Rustagi,Julia Salzman
DOI: https://doi.org/10.1101/2024.10.03.616574
2024-12-04
Abstract:RNA secondary and tertiary structures are essential to life. Experimental methods to detect RNA structure, such as X-ray crystallography and chemical probing, are incisive but suffer from low throughput and dimensionality. Computational approaches, leveraging evolutionary signals from correlated mutations, provide an alternative means to infer RNA structures. However, these methods require assembly and face challenges due to statistical biases inherent in multiple sequence alignment (MSA). Furthermore, these methods cannot exploit a given RNA element's full spectrum of natural sequence variations. Here, we introduce STRUCT (Statistical Testing of RNA Units with Covariation Traits), an assembly-free, MSA-free, and metadata-free statistical method for identifying conserved RNA structures from raw sequencing data, quantifying base-pair covariations or stem variation exclusion in the putative RNA structures. We show STRUCT rediscovers known HIV structural elements and identifies conserved rRNA structures in metatranscriptomics samples. Moreover, STRUCT finds viral structures in mosquito metatranscriptomics samples de novo, including previously unannotated viral genomes, highlighting the method's potential for viral discovery. STRUCT is an ultra-fast, easy-to-use, and robust tool that excels in high-throughput RNA structure prediction and hypothesis generation, presenting a novel approach for discovering structural RNA elements.
Genomics
What problem does this paper attempt to address?