Abstract:Epigenetic regulation orchestrates mammalian transcription, but functional links between them remain elusive. To tackle this problem, we use epigenomic and transcriptomic data from 13 ENCODE cell types to train machine learning models to predict gene expression from histone post-translational modifications (PTMs), achieving transcriptome-wide correlations of ~0.70-0.79 for most cell types. Our models recapitulate known associations between histone PTMs and expression patterns, including predicting that acetylation of histone subunit H3 lysine residue 27 (H3K27ac) near the transcription start site (TSS) significantly increases expression levels. To validate this prediction experimentally and investigate how natural vs. engineered deposition of H3K27ac might differentially affect expression, we apply the synthetic dCas9-p300 histone acetyltransferase system to 8 genes in the HEK293T cell line and to 5 genes in the K562 cell line. Further, to facilitate model building, we perform MNase-seq to map genome-wide nucleosome occupancy levels in HEK293T. We observe that our models perform well in accurately ranking relative fold-changes among genes in response to the dCas9-p300 system; however, their ability to rank fold-changes within individual genes is noticeably diminished compared to predicting expression across cell types from their native epigenetic signatures. Our findings highlight the need for more comprehensive genome-scale epigenome editing datasets, better understanding of the actual modifications made by epigenome editing tools, and improved causal models that transfer better from endogenous cellular measurements to perturbation experiments. Together these improvements would facilitate the ability to understand and predictably control the dynamic human epigenome with consequences for human health.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to predict the impact of CRISPR - Cas9 - based epigenome editing on gene expression. Specifically, the researchers used epigenomic and transcriptomic data from 13 ENCODE cell types to train machine - learning models to predict gene expression from histone post - translational modifications (PTMs). These models can achieve a transcriptome - wide correlation of approximately 0.70 to 0.79 across most cell types. Through these models, the researchers hope to understand and predict how to affect gene expression by locally altering specific histone modifications (such as H3K27ac) through CRISPR - Cas9 technology. ### Main research questions: 1. **Establishing models to predict gene expression**: The researchers used large - scale epigenetic and transcriptomic data sets to train machine - learning models to predict gene expression. These models can predict gene expression levels from histone post - translational modifications (PTMs). 2. **Verifying the accuracy of the models**: To verify the predictive ability of the models, the researchers applied a synthetic dCas9 - p300 histone acetyltransferase system in the HEK293T cell line and experimentally verified 8 genes. In addition, a similar experiment was carried out on 5 genes in the K562 cell line. 3. **Exploring the impact of local H3K27ac modification**: The researchers specifically focused on the impact of H3K27ac modification on gene expression and explored how local H3K27ac modification affects gene expression through a combination of experimental and computational models. ### Key methods: - **Data collection and processing**: Histone PTMs ChIP - seq and RNA - seq data of 13 different human cell types were obtained from the ENCODE project and batch - effect correction was carried out. - **Model training**: Convolutional neural networks (CNNs) and ridge regression models were used to predict gene expression from histone PTMs data near the transcription start site (TSS). - **Experimental verification**: Epigenome editing was carried out using the dCas9 - p300 system in the HEK293T cell line, and the relative mRNA abundance of target genes was measured by qPCR. - **Computational simulation**: The impact of dCas9 - p300 on H3K27ac modification was simulated by a computational model and compared with the experimental results. ### Main findings: - **Predictive ability of the models**: The models can accurately predict gene expression, especially in the case of across cell types, with the Spearman rank - correlation coefficient reaching approximately 0.8. - **Impact of local H3K27ac modification**: The experimental results show that local H3K27ac modification can significantly increase gene expression, but there are large differences in the efficiency of different gRNAs. - **Limitations of the models**: Although the models perform well in predicting the relative changes in gene expression, they perform poorly in predicting the expression changes within individual genes. ### Conclusion: This study shows how to use machine - learning models combined with experimental data to predict the impact of CRISPR - Cas9 - mediated epigenome editing on gene expression. Although the models perform excellently in predicting the relative changes in gene expression, further improvement is still needed to enhance their predictive ability within individual genes. These findings provide an important reference for future epigenome - editing research.

Predicting the effect of CRISPR-Cas9-based epigenome editing

Machine learning methods for predicting guide RNA effects in CRISPR epigenome editing experiments

Investigating crosstalk between H3K27 acetylation and H3K4 trimethylation in CRISPR/dCas-based epigenome editing and gene activation

Understanding Variation in Transcription Factor Binding by Modeling Transcription Factor Genome-Epigenome Interactions.

Tailoring a CRISPR/Cas-based Epigenome Editor for Programmable Chromatin Acylation and Decreased Cytotoxicity

Programmable human histone phosphorylation and gene activation using a CRISPR/Cas9-based chromatin kinase

Systematic epigenome editing captures the context-dependent instructive function of chromatin modifications

HyperCas12a enables highly-multiplexed epigenome editing screens

Advances in CRISPR-Cas systems for epigenetics

Interrogation of enhancer function by enhancer-targeting CRISPR epigenetic editing

Genome-wide determination of on-target and off-target characteristics for RNA-guided DNA methylation by dCas9 methyltransferases.

A Biophysical Model of CRISPR/Cas9 Activity for Rational Design of Genome Editing and Gene Regulation

Histone editing elucidates the functional roles of H3K27 methylation and acetylation in mammals

A combinatorial domain screening platform reveals epigenetic effector interactions for transcriptional perturbation

Regulation of gene expression by altered promoter methylation using a CRISPR/Cas9-mediated epigenetic editing system

Model-based analysis of chromatin interactions from dCas9-Based CAPTURE-3C-seq

Higher-order combinatorial chromatin perturbations by engineered CRISPR-Cas12a for functional genomics

Comparative Analysis of Machine Learning Algorithms for Predicting On-Target and Off-Target Effects of CRISPR-Cas13d for gene editing

Genome-wide specificity of DNA binding, gene regulation, and chromatin remodeling by TALE- and CRISPR/Cas9-based transcriptional activators

Engineered CRISPR-Cas12a for higher-order combinatorial chromatin perturbations

Rewriting regulatory DNA to dissect and reprogram gene expression