RNA threading with secondary structure and sequence profile

Zongyang Du,Zhenling Peng,Jianyi Yang
DOI: https://doi.org/10.1093/bioinformatics/btae080
IF: 5.8
2024-02-01
Bioinformatics
Abstract:Abstract Motivation RNA threading aims to identify remote homologies for template-based modeling of RNA 3D structure. Existing RNA alignment methods primarily rely on secondary structure alignment. They are often time- and memory-consuming, limiting large-scale applications. In addition, the accuracy is far from satisfactory. Results Using RNA secondary structure and sequence profile, we developed a novel RNA threading algorithm, named RNAthreader. To enhance the alignment process and minimize memory usage, a novel approach has been introduced to simplify RNA secondary structures into compact diagrams. RNAthreader employs a two-step methodology. Initially, integer programming and dynamic programming are combined to create an initial alignment for the simplified diagram. Subsequently, the final alignment is obtained using dynamic programming, taking into account the initial alignment derived from the previous step. The benchmark test on 80 RNAs illustrates that RNAthreader generates more accurate alignments than other methods, especially for RNAs with pseudoknots. Another benchmark, involving 30 RNAs from the RNA-Puzzles experiments, exhibits that the models constructed using RNAthreader templates have a lower average RMSD than those created by alternative methods. Remarkably, RNAthreader takes less than two hours to complete alignments with ∼5000 RNAs, which is 3–40 times faster than other methods. These compelling results suggest that RNAthreader is a promising algorithm for RNA template detection. Availability and implementation https://yanglab.qd.sdu.edu.cn/RNAthreader
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the long - range homology recognition problem in RNA template modeling. Existing RNA alignment methods mainly rely on secondary structure alignment. These methods are often time - consuming and occupy a large amount of memory, which limits their use in large - scale applications. In addition, the accuracy of existing methods is also not satisfactory, especially when dealing with RNAs containing pseudoknots. Therefore, this paper proposes a new RNA threading algorithm - RNAthreader, aiming to improve the accuracy and efficiency of RNA template alignment by simplifying the RNA secondary structure graph and combining sequence features. ### Main contributions 1. **Algorithm innovation**: RNAthreader introduces a new method that simplifies the RNA secondary structure into a compact graph and combines integer programming and dynamic programming techniques for initial alignment, and then uses dynamic programming to generate the final alignment result. 2. **Performance improvement**: Benchmark tests show that RNAthreader is superior to other methods in both alignment accuracy and speed, especially when dealing with RNAs containing pseudoknots. 3. **Practical application**: RNAthreader can complete the alignment of 5,000 RNAs in less than two hours, which is 30 to 40 times faster than other methods, showing its potential in practical applications. ### Method overview - **Data preparation**: A template library containing 18,195 RNAs was constructed, and training and test sets were prepared. - **Alignment process**: - **Initial alignment**: Simplify the RNA secondary structure into a linear arc graph and generate the initial alignment through integer programming. - **Final alignment**: Based on the initial alignment result, use dynamic programming to generate the final nucleotide - level alignment. - **Performance evaluation**: Evaluate the alignment quality through alignment coverage and RMSD of the alignment area, and evaluate the model quality through RMSD of the all - atom model. ### Experimental results - **TE80 data set**: RNAthreader is superior to other methods in alignment coverage and RMSD under different thresholds, especially when dealing with RNAs containing pseudoknots. - **RNA - Puzzles data set**: RNAthreader performs best in RMSD of the full - length model, although its average alignment coverage is not the highest. ### Conclusion RNAthreader significantly improves the accuracy and efficiency of RNA template alignment by simplifying the RNA secondary structure graph and combining sequence features, providing a new solution for RNA structure prediction.