Elena Rivas,Sean R. Eddy
Abstract:We describe a dynamic programming algorithm for predicting optimal RNA secondary structure, including pseudoknots. The algorithm has a worst case complexity of ${\cal O}(N^6)$ in time and ${\cal O}(N^4)$ in storage. The description of the algorithm is complex, which led us to adopt a useful graphical representation (Feynman diagrams) borrowed from quantum field theory. We present an implementation of the algorithm that generates the optimal minimum energy structure for a single RNA sequence, using standard RNA folding thermodynamic parameters augmented by a few parameters describing the thermodynamic stability of pseudoknots. We demonstrate the properties of the algorithm by using it to predict structures for several small pseudoknotted and non-pseudoknotted RNAs. Although the time and memory demands of the algorithm are steep, we believe this is the first algorithm to be able to fold optimal (minimum energy) pseudoknotted RNAs with the accepted RNA thermodynamic model.
What problem does this paper attempt to address?
This paper aims to solve an important problem in RNA secondary structure prediction, that is, how to accurately predict RNA structures containing pseudoknots. Traditional dynamic programming algorithms such as the Zuker algorithm can only predict RNA structures that conform to the "nested" rule and cannot handle non - nested structures such as pseudoknots. Pseudoknots are widespread in some functionally important RNA molecules, so it is of great significance to develop algorithms capable of predicting pseudoknots.
### Problems the paper attempts to solve
1. **Limitations of existing algorithms**: Existing RNA secondary structure prediction algorithms, such as the Zuker algorithm, are based on dynamic programming and can find the optimal nested structure within O(N^3) time complexity and O(N^2) space complexity. However, these algorithms cannot handle pseudoknots because pseudoknots violate the nested rule.
2. **Importance of pseudoknots**: Pseudoknots play important functional roles in a variety of known RNA molecules, such as ribosomal RNA, the catalytic core of group I introns, and RNase P RNA, etc. In addition, pseudoknots are also found at the 3' end of some plant virus RNAs and are used to mimic tRNA structures. Therefore, developing algorithms capable of predicting pseudoknots is crucial for understanding the functions of these RNAs.
3. **Computational complexity challenges**: In order to handle pseudoknots, it is necessary to extend the dynamic programming algorithm so that it can handle non - nested structures. This brings higher computational complexity. The algorithm proposed in this paper has a time complexity of O(N^6) and a space complexity of O(N^4) in the worst - case scenario. Although the computational requirements are high, this is the first algorithm that can use the standard RNA thermodynamic model to predict the optimal (lowest - energy) pseudoknot structure.
### Main contributions of the algorithm
- **Introduction of gap matrices**: To describe pseudoknots, the author introduced new gap matrices (whx, vhx, yhx, zhx), which can describe non - nested structures.
- **Graphical representation**: Drawing on Feynman diagrams in quantum field theory, the author proposed a graphical representation method to simplify the description and implementation of the algorithm.
- **Application of thermodynamic parameters**: The algorithm uses standard RNA folding thermodynamic parameters and adds several parameters that describe the thermodynamic stability of pseudoknots to ensure the accuracy of the prediction results.
### Result verification
The author verified the effectiveness of the algorithm by testing several small RNA sequences known to contain or not contain pseudoknots. The results show that for RNAs known to contain pseudoknots, the algorithm can successfully predict the pseudoknot structure; for RNAs that do not contain pseudoknots, the algorithm does not introduce false pseudoknots, and the results are similar to those predicted by MFOLD.
In conclusion, this paper proposes a new dynamic programming algorithm, which solves the problem that pseudoknot structures are difficult to predict in RNA secondary structure prediction and provides a powerful tool for studying functional RNA molecules.