α-HMM: A Graphical Model for RNA Folding

Sixiang Zhang,Aaron J. Yang,Liming Cai
2024-01-08
Abstract:RNA secondary structure is modeled with the novel arbitrary-order hidden Markov model ({\alpha}-HMM). The {\alpha}-HMM extends over the traditional HMM with capability to model stochastic events that may be in influenced by historically distant ones, making it suitable to account for long-range canonical base pairings between nucleotides, which constitute the RNA secondary structure. Unlike previous heavy-weight extensions over HMM, the {\alpha}-HMM has the flexibility to apply restrictions on how one event may influence another in stochastic processes, enabling efficient prediction of RNA secondary structure including pseudoknots.
Biomolecules,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is a key challenge in RNA secondary structure prediction, that is, how to effectively model and predict RNA secondary structures including pseudoknots. Traditional RNA secondary structure prediction methods, such as energy - minimization - based models and stochastic context - free grammar (SCFG) - based methods, although they have achieved high accuracy in predicting secondary structures without pseudoknots, they have limitations when dealing with pseudoknot structures. These methods cannot effectively handle long - distance dependencies, especially crossed base pairs, which makes them perform poorly in predicting complex RNA structures. To solve this problem, the authors proposed the Arbitrary - order Hidden Markov Model (α - HMM), a new graphical model that extends the traditional HMM. The α - HMM can better capture long - distance dependencies in RNA secondary structures, especially pseudoknot structures, by introducing additional edges (called influence edges) to represent the influence of historical events on current events. This method not only maintains the flexibility of the HMM, but also can improve the prediction accuracy of RNA secondary structures, including those with pseudoknots, while ensuring computational efficiency. Specifically, the α - HMM solves the problems in RNA secondary structure prediction in the following ways: 1. **Long - distance Dependency Modeling**: The α - HMM allows historical influence between events, which means that the model can capture the interactions between nucleotides that are far apart in the RNA sequence, which is crucial for modeling pseudoknot structures. 2. **Flexible Influence Mechanism**: Unlike traditional high - order HMMs, the α - HMM allows restrictions to be imposed on the influence between events, which makes the model more efficient and accurate in dealing with complex RNA structures. 3. **Efficient Decoding Algorithm**: The authors proposed a dynamic programming algorithm for decoding the most likely secondary structure of the input RNA sequence. The time complexity of this algorithm is O(n^3), which is more efficient than existing RNA pseudoknot prediction algorithms. In summary, the main contribution of this paper is to provide a new framework that can more accurately predict RNA secondary structures including pseudoknots while maintaining computational efficiency, thus providing a powerful tool for the study of RNA functions.