Abstract:The complex 'language' of plant RNA encodes a vast array of biological regulatory elements that orchestrate crucial aspects of plant growth, development, and adaptation to environmental stresses. Recent advancements in foundation models (FMs) have demonstrated their unprecedented potential to decipher complex 'language' in biology. In this study, we introduced PlantRNA-FM, a novel high-performance and interpretable RNA FM specifically designed based on RNA features including both sequence and structure. PlantRNA-FM was pre-trained on an extensive dataset, integrating RNA sequences and RNA structure information from 1,124 distinct plant species. PlantRNA-FM exhibits superior performance in plant-specific downstream tasks, such as plant RNA annotation prediction and RNA translation efficiency (TE) prediction. Compared to the second-best FMs, PlantRNA-FM achieved an F1 score improvement of up to 52.45% in RNA genic region annotation prediction and up to 15.30% in translation efficiency prediction, respectively. Our PlantRNA-FM is empowered by our interpretable framework that facilitates the identification of biologically functional RNA sequence and structure motifs, including both RNA secondary and tertiary structure motifs across transcriptomes. Through experimental validations, we revealed novel translation-associated RNA motifs in plants. Our PlantRNA-FM also highlighted the importance of the position information of these functional RNA motifs in genic regions. Taken together, our PlantRNA-FM facilitates the exploration of functional RNA motifs across the complexity of transcriptomes, empowering plant scientists with novel capabilities for programming RNA codes in plants.

Interpretable RNA Foundation Model from Unannotated Data for Highly Accurate RNA Structure and Function Predictions

Accurate RNA 3D Structure Prediction Using a Language Model-Based Deep Learning Approach

PlantRNA-FM: An Interpretable RNA Foundation Model for Exploration Functional RNA Motifs in Plants

Multiple sequence alignment-based RNA language model and its application to structural inference

Diverse Database and Machine Learning Model to Narrow the Generalization Gap in RNA Structure Prediction

UNI-RNA: UNIVERSAL PRE-TRAINED MODELS REVOLUTIONIZE RNA RESEARCH

E2Efold-3D: End-to-End Deep Learning Method for accurate de novo RNA 3D Structure Prediction

RNAformer: A Simple yet Effective Model for Homology-Aware RNA Secondary Structure Prediction

Orthrus: Towards Evolutionary and Functional RNA Foundation Models

Deciphering RNA regulation with a foundation language model

Advances and Opportunities in RNA Structure Experimental Determination and Computational Modeling

RNA 3D Structure Prediction: Progress and Perspective

Advances in RNA 3D Structure Modeling Using Experimental Data

Prediction of the RNA Tertiary Structure Based on a Random Sampling Strategy and Parallel Mechanism

RNA structure prediction: progress and perspective

High-throughput biochemistry in RNA sequence space: predicting structure and function

Attention-Based RNA Secondary Structure Prediction.

A long context RNA foundation model for predicting transcriptome architecture

RNADiffFold: Generative RNA Secondary Structure Prediction using Discrete Diffusion Models

Protein–RNA interaction prediction with deep learning: structure matters

Determining structures of individual RNA conformers using atomic force microscopy images and deep neural networks