Large Serine Integrase Off-Target Discovery with Deep Learning for Genome Wide Prediction

Matthew H Bakalar,Thomas Biondi,Xiaoyu Liang,Didac Santesmasses,Anne M Barra,Japan B Mehta,Jie Wang,Dane Z Hazelbaker,Jonathan D Finn,Daniel J O'Connell
DOI: https://doi.org/10.1101/2024.10.10.617699
2024-10-13
Abstract:Large Serine Integrases (LSIs) hold significant therapeutic promise due to their ability to efficiently incorporate gene-sized DNA into the human genome, offering a method to integrate healthy genes in patients with monogenic disorders or to insert gene circuits for the development of advanced cell therapies. To advance the application of LSIs for human therapeutic applications, new technologies and analytical methods for predicting and characterizing off-target recombination by LSIs are required. It is not experimentally tractable to validate off-target editing at all potential off- target sites in therapeutically relevant cell types because of sample limitations and genetic variation in the human population. To address this gap, we constructed a deep learning model named IntQuery that can predict LSI activity genome-wide. For Bxb1 integrase, IntQuery was trained on quantitative off-target data from 410,776 cryptic attB sequences discovered by Cryptic- seq, an unbiased in vitro discovery technology for LSI off-target recombination. We show that IntQuery can accurately predict in vitro LSI activity, providing a tool for in silico off-target prediction of large serine integrases to advance therapeutic applications.
Bioinformatics
What problem does this paper attempt to address?