DeepFM-Crispr: Prediction of CRISPR On-Target Effects via Deep Learning

Condy Bao,Fuxiao Liu
2024-09-10
Abstract:Since the advent of CRISPR-Cas9, a groundbreaking gene-editing technology that enables precise genomic modifications via a short RNA guide sequence, there has been a marked increase in the accessibility and application of this technology across various fields. The success of CRISPR-Cas9 has spurred further investment and led to the discovery of additional CRISPR systems, including CRISPR-Cas13. Distinct from Cas9, which targets DNA, Cas13 targets RNA, offering unique advantages for gene modulation. We focus on Cas13d, a variant known for its collateral activity where it non-specifically cleaves adjacent RNA molecules upon activation, a feature critical to its function. We introduce DeepFM-Crispr, a novel deep learning model developed to predict the on-target efficiency and evaluate the off-target effects of Cas13d. This model harnesses a large language model to generate comprehensive representations rich in evolutionary and structural data, thereby enhancing predictions of RNA secondary structures and overall sgRNA efficacy. A transformer-based architecture processes these inputs to produce a predictive efficacy score. Comparative experiments show that DeepFM-Crispr not only surpasses traditional models but also outperforms recent state-of-the-art deep learning methods in terms of prediction accuracy and reliability.
Quantitative Methods,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the problem of improving the targeting efficiency prediction of the CRISPR-Cas13d system and evaluating its off-target effects. Specifically, the paper introduces DeepFM-Crispr, a new deep learning-based model for predicting Cas13d targeting efficiency and assessing its off-target effects. ### Main Issues: 1. **Targeting Efficiency Prediction**: How to accurately predict the targeting efficiency of Cas13d on specific RNA sequences? 2. **Off-Target Effect Evaluation**: How to effectively evaluate the off-target effects of Cas13d, especially its impact on non-coding RNA? ### Background: - **CRISPR-Cas9**: A widely used gene-editing technology that achieves precise genome modification through short RNA guide sequences. - **CRISPR-Cas13**: Unlike Cas9, Cas13 targets RNA instead of DNA, offering unique gene regulation advantages, particularly the Cas13d variant, which non-specifically cleaves adjacent RNA molecules upon activation, a feature crucial to its function. ### Solution: - **DeepFM-Crispr**: Utilizes large language models to generate rich evolutionary and structural data representations, combined with transformer architecture to process these inputs and generate predictive efficiency scores. - **Model Features**: - **Data Representation**: Uses one-hot encoding to convert sgRNA sequences into binary vectors. - **RNA Large Language Model**: Extracts latent features of RNA sequences, using attention mechanisms to capture contextual information. - **Secondary Structure Prediction**: Uses a ResNet model to predict the secondary structure of sgRNA. - **Feature Integration and Processing**: Integrates outputs from RNA-FM and secondary structure prediction ResNet, further processing features using DenseNet architecture. - **Efficiency Prediction**: Uses a multi-layer perceptron (MLP) for the final sgRNA efficiency prediction. ### Experimental Results: - **Prediction Accuracy**: DeepFM-Crispr demonstrates higher R² values and more significant negative Pearson correlation coefficients in predicting sgRNA efficiency. - **Classification Task**: In binary classification tasks, DeepFM-Crispr not only matches the highest AUC performance of DeepCas13 but also significantly outperforms other methods in precision-recall (AUPR) metrics. ### Significance: - **Gene Editing Applications**: DeepFM-Crispr improves the accuracy of sgRNA efficiency prediction, supporting precise gene editing, especially in therapeutic applications where the precision of gene modification is crucial to treatment outcomes. Through these methods, DeepFM-Crispr not only addresses the shortcomings of existing tools in predicting Cas13d targeting efficiency and evaluating off-target effects but also provides new tools and methods for future gene editing research.