Accurate RNA 3D Structure Prediction Using a Language Model-Based Deep Learning Approach
Tao Shen,Zhihang Hu,Siqi Sun,Di Liu,Felix Wong,Jiuming Wang,Jiayang Chen,Yixuan Wang,Liang Hong,Jin Xiao,Liangzhen Zheng,Tejas Krishnamoorthi,Irwin King,Sheng Wang,Peng Yin,James J Collins,Yu Li
DOI: https://doi.org/10.1038/s41592-024-02487-0
IF: 48
2024-01-01
Nature Methods
Abstract:Accurate prediction of RNA three-dimensional (3D) structures remains an unsolved challenge. Determining RNA 3D structures is crucial for understanding their functions and informing RNA-targeting drug development and synthetic biology design. The structural flexibility of RNA, which leads to the scarcity of experimentally determined data, complicates computational prediction efforts. Here we present RhoFold+, an RNA language model-based deep learning method that accurately predicts 3D structures of single-chain RNAs from sequences. By integrating an RNA language model pretrained on ~23.7 million RNA sequences and leveraging techniques to address data scarcity, RhoFold+ offers a fully automated end-to-end pipeline for RNA 3D structure prediction. Retrospective evaluations on RNA-Puzzles and CASP15 natural RNA targets demonstrate the superiority of RhoFold+ over existing methods, including human expert groups. Its efficacy and generalizability are further validated through cross-family and cross-type assessments, as well as time-censored benchmarks. Additionally, RhoFold+ predicts RNA secondary structures and interhelical angles, providing empirically verifiable features that broaden its applicability to RNA structure and function studies.