An efficient algorithm to compute the minimum free energy of interacting nucleic acid strands

Ahmed Shalaby,Damien Woods
2024-07-13
Abstract:The information-encoding molecules RNA and DNA form a combinatorially large set of secondary structures through nucleic acid base pairing. Thermodynamic prediction algorithms predict favoured, or minimum free energy (MFE), secondary structures, and can assign an equilibrium probability to any structure via the partition function: a Boltzman-weighted sum over the set of secondary structures. MFE is NP-hard in the presence pseudoknots, base pairings that violate a restricted planarity condition. However, unpseudoknotted structures are amenable to dynamic programming: for a single DNA/RNA strand there are polynomial time algorithms for MFE and partition function. For multiple strands, the problem is more complicated due to entropic penalties. Dirks et al [SICOMP Review; 2007] showed that for O(1) strands, with N bases, there is a polynomial time in N partition function algorithm, however their technique did not generalise to MFE which they left open. We give the first polynomial time (O(N^4)) algorithm for unpseudoknotted multiple (O(1)) strand MFE, answering the open problem from Dirks et al. The challenge lies in considering rotational symmetry of secondary structures, a feature not immediately amenable to dynamic programming algorithms. Our proof has two main technical contributions: First, a polynomial upper bound on the number of symmetric secondary structures to be considered when computing rotational symmetry penalties. Second, that bound is leveraged by a backtracking algorithm to find the MFE in an exponential space of contenders. Our MFE algorithm has the same asymptotic run time as Dirks et al's partition function algorithm, suggesting efficient handling of rotational symmetry, although higher space complexity. It also seems reasonably tight in the number of strands since Codon, Hajiaghayi & Thachuk [DNA27, 2021] have shown that unpseudoknotted MFE is NP-hard for O(N) strands.
Data Structures and Algorithms,Computational Complexity,Discrete Mathematics,Biological Physics,Biomolecules
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: **How to efficiently calculate the minimum free energy (MFE) structures of multiple interacting nucleic acid strands (such as DNA or RNA), especially for the unpseudoknotted cases**. Specifically, the paper has solved the following two main problems: 1. **MFE prediction problem for multiple strands**: - For a single nucleic acid strand, existing algorithms can calculate its MFE within polynomial time. However, when multiple strands are involved, the problem becomes more complicated due to additional entropy penalties and complex interactions. - In particular, for multiple strands (of the order of magnitude \(O(1)\)), Dirks et al. proposed a polynomial - time partition function algorithm in 2007, but did not solve the MFE prediction problem, which became an open problem. 2. **Handling rotational symmetry**: - In the case of multiple strands, rotational symmetry is an important global property, which affects the statistical mechanical properties of the molecular structure. - Rotational symmetry can cause some structures to be overestimated or underestimated, thus affecting the accuracy of MFE. Therefore, a method is needed to correctly handle these symmetries. ### Main contributions of the paper - **Proposing a new polynomial - time algorithm**: The paper proposes the first polynomial - time algorithm for unpseudoknotted MFE prediction of multiple strands (of the order of magnitude \(O(1)\)), with a running time of \(O(N^4(c - 1)!)\), where \(N\) is the total number of bases in all strands and \(c\) is the number of strands. - **Solving the rotational symmetry problem**: By introducing the concept of "pizza cuts", the paper provides a polynomial upper bound to limit the number of symmetric structures that need to be considered and uses a backtracking algorithm to find the true MFE structure. ### Formula representation - Free energy formula: \[ \Delta G(S)=\sum_{l\in S}\Delta G(l)+(c - 1)\Delta G_{\text{assoc}}+k_BT\log R \] where: - \(\Delta G(l)\) is the free energy of each loop; - \(\Delta G_{\text{assoc}}\) is the entropy - associated penalty for each additional strand; - \(R\) is the rotational symmetry degree of the secondary structure; - \(k_B\) is the Boltzmann constant; - \(T\) is the temperature (in Kelvin). Through these methods, the paper not only solves the technical problems in MFE prediction but also provides new ideas and tools for future research.