Abstract:The off-target activities within the CRISPR-Cas9 system remains a formidable barrier to its broader application and development. Recent advancements have highlighted the potential of deep learning models in predicting these off-target effects, yet they encounter significant hurdles including imbalances within datasets and the intricacies associated with encoding schemes and model architectures. To surmount these challenges, our study innovatively introduces an Efficiency and Specificity-Based (ESB) class rebalancing strategy, specifically devised for datasets featuring mismatches-only off-target instances, marking a pioneering approach in this realm. Furthermore, through a meticulous evaluation of various One-hot encoding schemes alongside numerous hybrid neural network models, we discern that encoding and models of moderate complexity ideally balance performance and efficiency. On this foundation, we advance a novel hybrid model, the CRISPR-MCA, which capitalizes on multi-feature extraction to enhance predictive accuracy. The empirical results affirm that the ESB class rebalancing strategy surpasses five conventional methods in addressing extreme dataset imbalances, demonstrating superior efficacy and broader applicability across diverse models. Notably, the CRISPR-MCA model excels in off-target effect prediction across four distinct mismatches-only datasets and significantly outperforms contemporary state-of-the-art models in datasets comprising both mismatches and indels. In summation, the CRISPR-MCA model, coupled with the ESB rebalancing strategy, offers profound insights and a robust framework for future explorations in this field. In the field of gene editing, the application of deep learning technologies holds significant promise for predicting off-target effects in the CRISPR-Cas9 system. Nevertheless, one of the primary challenges encountered is the extreme imbalance among classes within the off-target datasets, which severely hampers the predictive accuracy for certain classes. Furthermore, as an array of sequence encoding methods continue to evolve, there has been a corresponding increase in model complexity. Addressing these issues, we introduce a novel Efficiency and Specificity-Based (ESB) class rebalancing strategy designed to mitigate the impact of class imbalance. Additionally, we assess the influence of six encoding schemes and four distinct architectural approaches on the prediction performance, employing four benchmark datasets for validation. Building upon these insights, we have developed a new hybrid model, termed CRISPR-MCA. Our experimental results demonstrate that the ESB strategy significantly surpasses the performance of existing baseline methods across multiple models. Moreover, the CRISPR-MCA model exhibits robust performance on two distinct types of datasets, affirming its effectiveness in enhancing the accuracy of deep learning predictions for off-target activities.

Interpretable CRISPR/Cas9 off-target activities with mismatches and indels prediction using BERT

Prediction of CRISPR/Cas9 Off-Target Activities with Mismatches and Indels Using Stacked BiGRU

CRISPR-M: Predicting sgRNA off-target effect using a multi-view deep learning network

CrnnCrispr: An Interpretable Deep Learning Method for CRISPR/Cas9 sgRNA On-Target Activity Prediction

Overcoming CRISPR-Cas9 off-target prediction hurdles: A novel approach with ESB rebalancing strategy and CRISPR-MCA model

CRSIPR-A-I: a webtool for the efficacy prediction of CRISPR activation and interference

Deep learning improves the ability of sgRNA off-target propensity prediction

CRISPR-DIPOFF: an interpretable deep learning approach for CRISPR Cas-9 off-target prediction

An Interpretable Deep Learning Approach for Predicting CRISPR/Cas9-Mediated Editing Outcomes

Using traditional machine learning and deep learning methods for on- and off-target prediction in CRISPR/Cas9: a review

Prediction of off-target effects of the CRISPR/Cas9 system for design of sgRNA

Prediction of Off-Target Specificity and Cell-Specific Fitness of CRISPR-Cas System Using Attention Boosted Deep Learning and Network-Based Gene Feature.

Optimized sgRNA design by deep learning to balance the off-target effects and on-target activity of CRISPR/Cas9

[Prediction of CRISPR/Cas9 off-target activity using multi-scale convolutional neural network]

Learning to quantify uncertainty in off-target activity for CRISPR guide RNAs

Comparative Analysis of Machine Learning Algorithms for Predicting On-Target and Off-Target Effects of CRISPR-Cas13d for gene editing

Navigating the CRISPR-Cas9 Frontier: AI-Enabled off-target prediction and sgRNA Design for Unprecedented Precision

Prediction of sgRNA Off-Target Activity in CRISPR/Cas9 Gene Editing Using Graph Convolution Network

Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR

CRISPRoffT: comprehensive database of CRISPR/Cas off-targets

Leveraging uncertainty quantification to optimise CRISPR guide RNA selection