Concurrent prediction of RNA secondary structures with pseudoknots and local 3D motifs in an integer programming framework

Gabriel Loyer,Vladimir Reinharz
DOI: https://doi.org/10.1093/bioinformatics/btae022
IF: 5.8
2024-01-16
Bioinformatics
Abstract:Abstract Motivation The prediction of RNA structure canonical base pairs from a single sequence, especially pseudoknotted ones, remains challenging in a thermodynamic models that approximates the energy of the local 3D motifs joining canonical stems. It has become more and more apparent in recent years that the structural motifs in the loops, composed of noncanonical interactions, are essential for the final shape of the molecule enabling its multiple functions. Our capacity to predict accurate 3D structures is also limited when it comes to the organization of the large intricate network of interactions that form inside those loops. Results We previously developed the integer programming framework RNA Motifs over Integer Programming (RNAMoIP) to reconcile RNA secondary structure and local 3D motif information available in databases. We further develop our model to now simultaneously predict the canonical base pairs (with pseudoknots) from base pair probability matrices with or without alignment. We benchmarked our new method over the all nonredundant RNAs below 150 nucleotides. We show that the joined prediction of canonical base pairs structure and local conserved motifs (i) improves the ratio of well-predicted interactions in the secondary structure, (ii) predicts well canonical and Wobble pairs at the location where motifs are inserted, (iii) is greatly improved with evolutionary information, and (iv) noncanonical motifs at kink-turn locations. Availability and implementation The source code of the framework is available at https://gitlab.info.uqam.ca/cbe/RNAMoIP and an interactive web server at https://rnamoip.cbe.uqam.ca/.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are two key challenges in RNA secondary structure prediction: 1. **Prediction of Pseudoknots**: Traditional thermodynamic models usually assume that there are no cross - interactions (i.e., pseudoknots) when predicting RNA secondary structures, because these pseudoknots will increase the complexity of the model and reduce its accuracy. However, pseudoknots are very common and important in RNA molecules and can significantly affect the function of RNA. Therefore, it is of great significance to develop methods that can accurately predict the RNA secondary structures containing pseudoknots. 2. **Integration of Local 3D Motifs**: The loop regions in RNA molecules contain many non - canonical interactions, which are crucial to the final shape and function of RNA. Existing methods have limitations in predicting these complex interaction networks. Therefore, how to integrate the information of these local 3D motifs into secondary structure prediction to improve the prediction accuracy is also an important research direction. To address these two challenges, the author proposes a method based on an integer programming framework - RNA Motifs over Integer Programming (RNAMoIP), which can consider the effects of pseudoknots and local 3D motifs while predicting RNA secondary structures. Specifically, this method is implemented through the following steps: - **Decomposing Secondary Structures**: Decompose the RNA secondary structure into multiple sub - structures without pseudoknots and calculate the base - pairing probability matrix for each sub - structure. - **Finding Motif Positions**: Use pattern - matching techniques to find all possible motif positions in the input sequence. - **Solving the Integer Programming Model**: Optimize the objective function to find the optimal base - pairing combinations and motif insertion positions. - **Iterative Optimization**: Through multiple iterations until the results of two consecutive iterations are the same or the preset number of iterations or time limit is reached. Through this method, the author hopes to not only improve the prediction accuracy of pseudoknots but also better capture the non - canonical interactions in local 3D motifs when predicting RNA secondary structures, thereby improving the overall prediction accuracy.