RNAprofiling 2.0: Enhanced cluster analysis of structural ensembles

Forrest Hurley,Christine Heitsch
DOI: https://doi.org/10.1016/j.jmb.2023.168047
2023-03-28
Abstract:Understanding the base pairing of an RNA sequence provides insight into its molecular <a class="link-external link-http" href="http://structure.By" rel="external noopener nofollow">this http URL</a> mining suboptimal sampling data, RNAprofiling 1.0 identifies the dominant helices in low-energy secondary structures as features, organizes them into profiles which partition the Boltzmann sample, and highlights key similarities/differences among the most informative, i.e. selected, profiles in a graphical format. Version 2.0 enhances every step of this approach. First, the featured substructures are expanded from helices to stems. Second, profile selection includes low-frequency pairings similar to featured ones. In conjunction, these updates extend the utility of the method to sequences up to length 600, as evaluated over a sizable dataset. Third, relationships are visualized in a decision tree which highlights the most important structural differences. Finally, this cluster analysis is made accessible to experimental researchers in a portable format as an interactive webpage, permitting a much greater understanding of trade-offs among different possible base pairing combinations.
Biomolecules
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the clustering analysis method of RNA secondary structures, especially for the analysis of Boltzmann samples of longer sequences (up to 600 nucleotides at most). Specifically, the paper proposes RNAprofiling 2.0 (Pv2), an upgraded version of RNAprofiling 1.0 (Pv1), aiming to enhance the analysis ability of RNA secondary structures in the following ways: 1. **Expansion of characteristic sub - structures**: Expand from only considering helices to considering stems, which helps to capture more complex structural features. 2. **Inclusion of low - frequency pairings**: When selecting features, not only consider the pairings that occur frequently, but also include pairings that are similar to these pairings but have a lower frequency, thereby improving the comprehensiveness and accuracy of the analysis. 3. **Relationship visualization**: Show the relationships between different feature combinations in the form of decision trees, highlighting important structural differences and facilitating further analysis and verification by experimental researchers. 4. **Interactive web - page output**: Provide a portable interactive web - page, enabling experimental researchers to understand and compare different structural combinations more intuitively, and promoting the generation and verification of hypotheses. Through these improvements, RNAprofiling 2.0 can more effectively handle the analysis of RNA secondary structures of long sequences, providing high - quality information to support further computational analysis, experimental testing, and biological insights.