UNI-RNA: UNIVERSAL PRE-TRAINED MODELS REVOLUTIONIZE RNA RESEARCH

Guolin Ke,Ruichu Gu,Zhiyuan Chen,Han Wen,Xi Wang,Yongge Li,Xiaohong Ji
DOI: https://doi.org/10.1101/2023.07.11.548588
2023-07-12
bioRxiv
Abstract:RNA molecules play a crucial role as intermediaries in diverse biological processes. Attaining a profound understanding of their function can substantially enhance our comprehension of life’s activities and facilitate drug development for numerous diseases. The advent of high-throughput sequencing technologies makes vast amounts of RNA sequence data accessible, which contains invaluable information and knowledge. However, deriving insights for further application from such an immense volume of data poses a significant challenge. Fortunately, recent advancements in pre-trained models have surfaced as a revolutionary solution for addressing such challenges owing to their exceptional ability to automatically mine and extract hidden knowledge from massive datasets. Inspired by the past successes, we developed a novel context-aware deep learning model named Uni-RNA that performs pre-training on the largest dataset of RNA sequences at the unprecedented scale to date. During this process, our model autonomously unraveled the obscured evolutionary and structural information embedded within the RNA sequences. As a result, through fine-tuning, our model achieved the state-of-the-art (SOTA) performances in a spectrum of downstream tasks, including both structural and functional predictions. Overall, Uni-RNA established a new research paradigm empowered by the large pre-trained model in the field of RNA, enabling the community to unlock the power of AI at a whole new level to significantly expedite the pace of research and foster groundbreaking discoveries.
Biology,Computer Science
What problem does this paper attempt to address?
The paper aims to address the following issues: 1. **RNA Structure and Function Prediction**: By developing a novel context-aware deep learning model named Uni-RNA, it extracts hidden evolutionary and structural information from large-scale RNA sequence data to accurately predict RNA secondary and tertiary structures as well as functions. Uni-RNA achieves state-of-the-art performance on various downstream tasks. 2. **mRNA-related Tasks**: Specifically, it predicts ribosome load based on 5'UTR sequences to optimize protein expression and predicts the proportion of APA (alternative polyadenylation) isoforms corresponding to 3'UTR sequences, thereby better regulating gene expression. 3. **Revealing the Relationship Between Sequence and Function**: By enhancing sequence representation capabilities through pre-trained models, it improves the performance of downstream tasks, including cross-species splice site prediction and functional classification of non-coding RNA (ncRNA), maintaining high accuracy even under boundary noise conditions. 4. **RNA Modification Site Prediction**: It develops models capable of handling full-length RNA sequences to accurately identify RNA modification sites, further elucidating the functional significance and regulatory mechanisms of RNA molecules. In summary, this paper leverages the large-scale pre-trained model Uni-RNA to address multiple key challenges in RNA research, providing a new research paradigm for RNA science.