NNfold: RNA Secondary Structure Prediction by Deep Learning with an Architecture Imposing Contextual Constraining
Christophe Van Neste,Ramzan Umarov,Yu Li,Adil Salhi,Hiroyuki Kuwahara,Xin Gao
DOI: https://doi.org/10.2139/ssrn.3813288
2021-01-01
SSRN Electronic Journal
Abstract:Ribonucleic acid (RNA) secondary structures are the determining factor for the many roles RNA plays in life. They can be obtained experimentally by techniques such as X-ray diffraction and NMR imaging. However, given that experimental methods are laborious and expensive, they are not fit for high throughput analysis. Computational prediction algorithms are complementary in predicting the RNA secondary structures for the multitude of known RNA sequences lacking structural information. Here, we introduce NNfold, a novel sequence-based deep neural network method to predict RNA secondary structures. The predictions are made by combining a local and a global model: first, we construct a matrix with the pairing likelihood of each nucleotide by predicting all potential interactions using a convolutional deep learning model. Next, we modify the list of base pairs obtained from the matrix using a second model whose output is used to ensure the contextual validity of the predicted secondary structure. Within the RNA Strand database, NNfold performed much better than thermodynamics-based methods on a diverse set of RNA sequences, improving the average F1 score by 0.20. It is capable of predicting pseudoknots which is a challenging task for other approaches. We also extracted the learned thermodynamic features within the model, which can help advance the construction of new biological models to predict RNA secondary structures. Our developed method is available as a service online at http://www.cbrc.kaust.edu.sa/NNfold/, or as an installable package at https://github.com/ramzan1990/NNfold.
What problem does this paper attempt to address?