Deep Generative Optimization of mRNA Codon Sequences for Enhanced Protein Production and Therapeutic Efficacy

Yupeng Li,Fan Wang,Jiaqi Yang,Zirong Han,Linfeng Chen,Wenbing Jiang,Hao Zhou,Tong Li,Zehua Tang,Jianxiang Deng,Xin He,Gaofeng Zha,Jiekai Hu,Yong Hu,Linping Wu,Changyou Zhan,Caijun Sun,Yao He,Zhi Xie
DOI: https://doi.org/10.1101/2024.09.06.611590
2024-09-08
Abstract:Messenger RNA (mRNA) therapeutics show immense promise, but their efficacy is limited by suboptimal protein expression. Here, we present RiboCode, a deep learning framework that generates mRNA codon sequences for enhanced protein production. RiboCode introduces several advances, including direct learning from large-scale ribosome profiling data, context-aware mRNA optimization and generative exploration of a large sequence space. In silico analysis demonstrate RiboCode's robust predictive accuracy for unseen genes and cellular environments. In vitro experiments show substantial improvements in protein expression, with up to a 72-fold increase, significantly outperforming past methods. In addition, RiboCode achieves cell-type specific expression and demonstrates robust performance across different mRNA formats, including m1Ψ-modified and circular mRNAs, an important feature for mRNA therapeutics. In vivo mouse studies show that optimized influenza hemagglutinin mRNAs induce ten times stronger neutralizing antibody responses against influenza virus compared to the unoptimized sequence. In an optic nerve crush model, optimized nerve growth factor mRNAs achieve equivalent neuroprotection of retinal ganglion cells at one-fifth the dose of the unoptimized sequence. Collectively, RiboCode represents a paradigm shift from rule-based to data-driven, context-sensitive approach for mRNA therapeutic applications, enabling the development of more potent and dose-efficient treatments.
Bioinformatics
What problem does this paper attempt to address?
The paper aims to address the issue of low protein expression efficiency in mRNA therapies. Specifically, although mRNA therapies show great potential in disease treatment, improving the translation efficiency of mRNA molecules delivered into cells remains a key challenge. To this end, the research team developed a deep learning framework called RiboCode to optimize mRNA codon sequences, thereby significantly enhancing protein production efficiency. The main contributions of RiboCode include: 1. **Learning directly from large-scale ribosome profiling data (Ribo-seq)**: This method can capture complex mRNA translation patterns. 2. **Context-aware mRNA optimization**: By considering the effects of different cellular environments, the optimization becomes more targeted. 3. **Exploring the vast sequence space**: Utilizing generative methods to explore more codon combinations to find the optimal sequence. Experimental results show that RiboCode performs well in various cell types and different mRNA formats (including m1Ψ modifications and circular mRNA), significantly improving protein expression levels. For example, in the optimization of influenza virus hemagglutinin (HA) mRNA, the optimized sequence induced a tenfold stronger neutralizing antibody response in mice compared to the non-optimized sequence. Additionally, in an optic nerve injury model, the optimized nerve growth factor (NGF) mRNA achieved the same neuroprotective effect as the non-optimized sequence at one-fifth the dose. These results indicate that RiboCode brings revolutionary advancements to the application of mRNA therapies.