BPPart and BPMax: RNA-RNA Interaction Partition Function and Structure Prediction for the Base Pair Counting Model

Ali Ebrahimpour-Boroojeny,Sanjay Rajopadhye,Hamidreza Chitsaz
DOI: https://doi.org/10.48550/arXiv.1904.01235
IF: 6.064
2019-04-02
Biomolecules
Abstract:RNA-RNA interaction (RRI) is ubiquitous and has complex roles in the cellular functions. In human health studies, miRNA-target and lncRNAs are among an elite class of RRIs that have been extensively studied. Bacterial ncRNA-target and RNA interference are other classes of RRIs that have received significant attention. In recent studies, mRNA-mRNA interaction instances have been observed, where both partners appear in the same pathway without any direct link between them, or any prior knowledge about their relationship. Those recently discovered cases suggest that RRI scope is much wider than those aforementioned elite classes. We revisit our RNA-RNA interaction partition function algorithm, piRNA, which computes the partition function, base-pairing probabilities, and structure for the comprehensive Turner energy model using 96 different dynamic programming tables. In this study, we strategically retreat from sophisticated thermodynamic models to the much simpler base pair counting model. That might seem counter-intuitive at the first glance; our idea is to benefit from the advantages of such simple models in terms of running time and memory footprint and compensate for the associated information loss by adding machine learning components in the future. Here, simple weighted base pair counting is considered to obtain BPPart for Base-pair Partition function and BPMax for Base-pair Maximization, which use 9 and 2 tables respectively. They are empirically 225 and 1350 fold faster than piRNA. A correlation of 0.855 and 0.836 was achieved between piRNA and BPPart and between piRNA and BPMax, respectively, in 37 degrees, and 0.920 and 0.904 in -180 degrees. We also discover two partner RNAs, SNORD3D and TRAF3, and hypothesize their potential roles in genetic diseases. We envision fusion of machine learning methods with the proposed algorithms in the future.
What problem does this paper attempt to address?