Predicting the effect of CRISPR-Cas9-based epigenome editing

Sanjit Singh Batra,Alan Cabrera,Jeffrey P. Spence,Jacob Goell,Selvalakshmi S. Anand,Isaac Hilton,Yun S Song
DOI: https://doi.org/10.1101/2023.10.03.560674
2024-10-16
Abstract:Epigenetic regulation orchestrates mammalian transcription, but functional links between them remain elusive. To tackle this problem, we use epigenomic and transcriptomic data from 13 ENCODE cell types to train machine learning models to predict gene expression from histone post-translational modifications (PTMs), achieving transcriptome-wide correlations of ~0.70-0.79 for most cell types. Our models recapitulate known associations between histone PTMs and expression patterns, including predicting that acetylation of histone subunit H3 lysine residue 27 (H3K27ac) near the transcription start site (TSS) significantly increases expression levels. To validate this prediction experimentally and investigate how natural vs. engineered deposition of H3K27ac might differentially affect expression, we apply the synthetic dCas9-p300 histone acetyltransferase system to 8 genes in the HEK293T cell line and to 5 genes in the K562 cell line. Further, to facilitate model building, we perform MNase-seq to map genome-wide nucleosome occupancy levels in HEK293T. We observe that our models perform well in accurately ranking relative fold-changes among genes in response to the dCas9-p300 system; however, their ability to rank fold-changes within individual genes is noticeably diminished compared to predicting expression across cell types from their native epigenetic signatures. Our findings highlight the need for more comprehensive genome-scale epigenome editing datasets, better understanding of the actual modifications made by epigenome editing tools, and improved causal models that transfer better from endogenous cellular measurements to perturbation experiments. Together these improvements would facilitate the ability to understand and predictably control the dynamic human epigenome with consequences for human health.
Bioinformatics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to predict the impact of CRISPR - Cas9 - based epigenome editing on gene expression. Specifically, the researchers used epigenomic and transcriptomic data from 13 ENCODE cell types to train machine - learning models to predict gene expression from histone post - translational modifications (PTMs). These models can achieve a transcriptome - wide correlation of approximately 0.70 to 0.79 across most cell types. Through these models, the researchers hope to understand and predict how to affect gene expression by locally altering specific histone modifications (such as H3K27ac) through CRISPR - Cas9 technology. ### Main research questions: 1. **Establishing models to predict gene expression**: The researchers used large - scale epigenetic and transcriptomic data sets to train machine - learning models to predict gene expression. These models can predict gene expression levels from histone post - translational modifications (PTMs). 2. **Verifying the accuracy of the models**: To verify the predictive ability of the models, the researchers applied a synthetic dCas9 - p300 histone acetyltransferase system in the HEK293T cell line and experimentally verified 8 genes. In addition, a similar experiment was carried out on 5 genes in the K562 cell line. 3. **Exploring the impact of local H3K27ac modification**: The researchers specifically focused on the impact of H3K27ac modification on gene expression and explored how local H3K27ac modification affects gene expression through a combination of experimental and computational models. ### Key methods: - **Data collection and processing**: Histone PTMs ChIP - seq and RNA - seq data of 13 different human cell types were obtained from the ENCODE project and batch - effect correction was carried out. - **Model training**: Convolutional neural networks (CNNs) and ridge regression models were used to predict gene expression from histone PTMs data near the transcription start site (TSS). - **Experimental verification**: Epigenome editing was carried out using the dCas9 - p300 system in the HEK293T cell line, and the relative mRNA abundance of target genes was measured by qPCR. - **Computational simulation**: The impact of dCas9 - p300 on H3K27ac modification was simulated by a computational model and compared with the experimental results. ### Main findings: - **Predictive ability of the models**: The models can accurately predict gene expression, especially in the case of across cell types, with the Spearman rank - correlation coefficient reaching approximately 0.8. - **Impact of local H3K27ac modification**: The experimental results show that local H3K27ac modification can significantly increase gene expression, but there are large differences in the efficiency of different gRNAs. - **Limitations of the models**: Although the models perform well in predicting the relative changes in gene expression, they perform poorly in predicting the expression changes within individual genes. ### Conclusion: This study shows how to use machine - learning models combined with experimental data to predict the impact of CRISPR - Cas9 - mediated epigenome editing on gene expression. Although the models perform excellently in predicting the relative changes in gene expression, further improvement is still needed to enhance their predictive ability within individual genes. These findings provide an important reference for future epigenome - editing research.