Deep generative models generate mRNA sequences with enhanced translation capacity and stability

He Zhang,Hailong Liu,Yushan Xu,Yiming Liu,Jia Wang,Yan Qin,Haiyan Wang,Lili Ma,Zhiyuan Xun,Timothy K. Lu,Jicong Cao
DOI: https://doi.org/10.1101/2024.06.20.599727
2024-06-20
Abstract:Despite the tremendous success of messenger RNA (mRNA) COVID-19 vaccines, the extension of this modality to a broader spectrum of diseases necessitates substantial enhancements, particularly in the design of mRNAs with elevated expression levels and extended durability. Here we present GEMORNA, a deep generative model designed to generate novel mRNA coding sequences (CDSs) and untranslated regions (UTRs) with superior translation capacity, comparable to the sophisticated task of language translation and free-form poetry composition with accurate grammar and semantics. Our AI model was trained on an extensive collection of RNA sequences from diverse families, further enhanced with labeled data to refine its performance. Remarkably, we demonstrate that our AI-generated mRNAs exhibited 8.2-fold and 15.9-fold increases in firefly luciferase expression compared to benchmark mRNAs in two different cell types. Additionally, Our AI-designed COVID-19 mRNA vaccine elicited a 4-fold increase in anti-COVID antibody titer in mice relative to BNT162b2. Furthermore, GEMORNA's versatility extends to circular mRNA design, which we facilitated a 27-fold increase in human erythropoietin protein expression in vivo than a systematically optimized benchmark sequence. We also created circular mRNAs with substantial improvements in expression levels, durability and anti-tumor cell cytotoxicity in mRNA-transduced CAR-T cells compared with an experimentally validated benchmark. In summary, GEMORNA generates novel mRNA sequences with significant performance improvements and has the potential to enable a wide range of therapeutic and vaccine applications.
Synthetic Biology
What problem does this paper attempt to address?
The paper aims to address several key issues in mRNA sequence design to improve its translation efficiency and stability. Specifically: 1. **Translation Efficiency and Stability**: Although mRNA vaccines (such as those for COVID-19) have achieved great success, significant improvements in mRNA design capabilities are needed to apply them to a broader range of disease treatments, particularly in enhancing expression levels and extending durability. 2. **Coding Sequence (CDS) Optimization**: Traditional methods (such as optimizing the Codon Adaptation Index (CAI)) have been effective in local optimization but have limitations in global optimization. Existing LSTM-based models lack attention mechanisms, making them ineffective at handling long-range dependencies and inefficient in training. 3. **Untranslated Region (UTR) Design**: The design of the 5' UTR is particularly challenging because its regulatory mechanisms are not fully understood. Existing methods typically rely on the 5' UTR of natural mRNA or enhance translation efficiency by minimizing secondary structures, but these approaches still have limitations. To address these issues, researchers developed the GEMORNA model, a generative model capable of designing novel mRNA sequences with higher translation efficiency and stability. Experimental results show that mRNA generated by GEMORNA exhibits significantly improved translation efficiency in different cell types and higher antibody titers in mouse experiments. Additionally, GEMORNA has been used to design circular RNA (circRNA), further enhancing protein expression levels and anti-tumor cytotoxicity.