CRISPR-DIPOFF: an interpretable deep learning approach for CRISPR Cas-9 off-target prediction

Toufikuzzaman,Abul Hassan Samee,M Sohel Rahman,Md Toufikuzzaman,Md Abul Hassan Samee
DOI: https://doi.org/10.1093/bib/bbad530
IF: 9.5
2024-01-22
Briefings in Bioinformatics
Abstract:Abstract CRISPR Cas-9 is a groundbreaking genome-editing tool that harnesses bacterial defense systems to alter DNA sequences accurately. This innovative technology holds vast promise in multiple domains like biotechnology, agriculture and medicine. However, such power does not come without its own peril, and one such issue is the potential for unintended modifications (Off-Target), which highlights the need for accurate prediction and mitigation strategies. Though previous studies have demonstrated improvement in Off-Target prediction capability with the application of deep learning, they often struggle with the precision-recall trade-off, limiting their effectiveness and do not provide proper interpretation of the complex decision-making process of their models. To address these limitations, we have thoroughly explored deep learning networks, particularly the recurrent neural network based models, leveraging their established success in handling sequence data. Furthermore, we have employed genetic algorithm for hyperparameter tuning to optimize these models’ performance. The results from our experiments demonstrate significant performance improvement compared with the current state-of-the-art in Off-Target prediction, highlighting the efficacy of our approach. Furthermore, leveraging the power of the integrated gradient method, we make an effort to interpret our models resulting in a detailed analysis and understanding of the underlying factors that contribute to Off-Target predictions, in particular the presence of two sub-regions in the seed region of single guide RNA which extends the established biological hypothesis of Off-Target effects. To the best of our knowledge, our model can be considered as the first model combining high efficacy, interpretability and a desirable balance between precision and recall.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the prediction of off - target effects that may occur during the gene - editing process of CRISPR - Cas - 9. Specifically, although the CRISPR - Cas - 9 technology has great potential in precisely editing DNA sequences, its potential non - target - site editing (i.e., off - target effects) is an important challenge. These non - target - site edits may lead to unexpected gene mutations, thereby affecting experimental results or treatment effects. Therefore, accurately predicting and reducing these off - target effects is crucial for the successful application of CRISPR technology. Although existing research has improved the prediction ability of off - target effects through deep - learning methods, there is a trade - off between precision and recall, which limits the effectiveness of these methods. In addition, most existing deep - learning models lack the ability to explain complex decision - making processes, which is an important defect in the practical application of CRISPR technology, because understanding the internal operation and decision - making processes of the model is crucial for ensuring the safety and reliability of the technology. To solve these problems, this paper proposes CRISPR - DIPOFF, which is an interpretable deep - learning framework for predicting the off - target effects of CRISPR - Cas - 9. The main contributions include: 1. **Proposing CRISPR - DIPOFF**: An interpretable deep - learning model suite that accurately predicts off - target sites using sequence data. 2. **Optimizing hyperparameters**: Using genetic algorithms to optimize the hyperparameters of the deep - learning model to improve model performance. 3. **Model explanation**: Explaining the model through the Integrated Gradients method, establishing connections with known biological hypotheses, especially identifying two possible sub - regions in the single - guide RNA seed region, one of which is positively correlated with off - target effects. 4. **Modular implementation**: Developing a modular implementation and making the code public for easy replication and extension of the algorithm. 5. **Balancing precision and recall**: As far as the authors know, this is the first model that successfully balances precision and recall while maintaining high efficiency and interpretability. Through these contributions, this paper aims to provide a more accurate, more reliable, and more interpretable method for predicting the off - target effects of CRISPR - Cas - 9, thereby promoting the application of this technology in fields such as biotechnology, agriculture, and medicine.