Discovery and Quantification of Long-Range RNA Base Pairs in Coronavirus Genomes with SEARCH-MaP and SEISMIC-RNA

Matthew F Allan,Justin Aruda,Jesse S Plung,Scott L Grote,Yves J Martin,Albéric A de Lajarte,Mark Bathe,Silvi Rouskin
DOI: https://doi.org/10.1101/2024.04.29.591762
2024-07-28
Abstract:RNA molecules perform a diversity of essential functions for which their linear sequences must fold into higher-order structures. Techniques including crystallography and cryogenic electron microscopy have revealed 3D structures of ribosomal, transfer, and other well-structured RNAs; while chemical probing with sequencing facilitates secondary structure modeling of any RNAs of interest, even within cells. Ongoing efforts continue increasing the accuracy, resolution, and ability to distinguish coexisting alternative structures. However, no method can discover and quantify alternative structures with base pairs spanning arbitrarily long distances -- an obstacle for studying viral, messenger, and long noncoding RNAs, which may form long-range base pairs. Here, we introduce the method of Structure Ensemble Ablation by Reverse Complement Hybridization with Mutational Profiling (SEARCH-MaP) and software for Structure Ensemble Inference by Sequencing, Mutation Identification, and Clustering of RNA (SEISMIC-RNA). We use SEARCH-MaP and SEISMIC-RNA to discover that the frameshift stimulating element of SARS coronavirus 2 base-pairs with another element 1~kilobase downstream in nearly half of RNA molecules, and that this structure competes with a pseudoknot that stimulates ribosomal frameshifting. Moreover, we identify long-range base pairs involving the frameshift stimulating element in other coronaviruses including SARS coronavirus 1 and transmissible gastroenteritis virus, and model the full genomic secondary structure of the latter. These findings suggest that long-range base pairs are common in coronaviruses and may regulate ribosomal frameshifting, which is essential for viral RNA synthesis. We anticipate that SEARCH-MaP will enable solving many RNA structure ensembles that have eluded characterization, thereby enhancing our general understanding of RNA structures and their functions. SEISMIC-RNA, software for analyzing mutational profiling data at any scale, could power future studies on RNA structure and is available on GitHub and the Python Package Index.
Molecular Biology
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to discover and quantify long - range RNA base pairs in the coronavirus genome. Specifically, the researchers developed a new experimental method - Structure Ensemble Ablation by Reverse Complement Hybridization with Mutational Profiling (SEARCH - MaP), and the corresponding software tool - Structure Ensemble Inference by Sequencing, Mutation Identification, and Clustering of RNA (SEISMIC - RNA), to detect and quantify base pairs spanning any long - range in RNA molecules. ### Main problems 1. **Discovering long - range RNA base pairs**: - Current methods are unable to discover and quantify RNA base pairs spanning any long - range, which limits the research on viral RNA, messenger RNA, and long non - coding RNA. - Using SEARCH - MaP and SEISMIC - RNA, the researchers discovered a structure in SARS - CoV - 2 that contains multiple long - range base pairs and folds in nearly half of the genomic RNA molecules. 2. **Quantifying the formation proportion of long - range RNA base pairs**: - The researchers not only discovered these long - range base pairs but also quantified their formation proportion in RNA molecules through cluster analysis. - For example, they found that two inner stems in a specific structure (FSE - arch) in SARS - CoV - 2 form in 47% ± 4% of the molecules. 3. **Exploring the functions of long - range base pairs**: - These long - range base pairs may be involved in regulating ribosomal frameshifting, which is crucial for viral RNA synthesis. - The researchers found that these long - range base pairs compete with a pseudoknot that stimulates ribosomal frameshifting, thus affecting the synthesis of viral proteins. ### Methods and results - **SEARCH - MaP**: - By binding antisense oligonucleotides (ASO) to prevent base pairing in specific regions, and then conducting chemical probe experiments and mutation analysis. - By comparing the mutation profiles under the conditions with and without ASO, the existence of long - range base pairs can be detected. - **SEISMIC - RNA**: - Used to process and analyze mutation profile data, cluster different RNA structures, and predict secondary structures. - Through cluster analysis, the researchers were able to determine the proportions and chemical reactivities of different structures. ### Significance - **Technological progress**: - Provides a new method and tool that can detect and quantify long - range RNA base pairs, filling the gaps in existing methods. - These tools are expected to be applied to the research of other RNA molecules, further promoting the understanding of RNA structure and function. - **Biological significance**: - Reveals the prevalence of long - range base pairs in the coronavirus genome and their role in regulating viral protein synthesis. - These findings are helpful for understanding the replication mechanism of the virus and provide new targets for the development of antiviral drugs. In conclusion, this paper solves the problem of detecting and quantifying long - range RNA base pairs by developing new experimental methods and software tools, opening up new avenues for the research of RNA structure and function.