ChromBERT: Uncovering Chromatin State Motifs in the Human Genome Using a BERT-based Approach

Seohyun Lee,Che Lin,Chien-Yu Chen,Ryuichiro Nakato
DOI: https://doi.org/10.1101/2024.07.25.605219
2024-07-26
Abstract:Chromatin states, fundamental to gene regulation and cellular identity, are defined by a unique combination of histone post-translational modifications. Despite their importance, comprehensive patterns within chromatin state sequences, which could provide insights into key biological functions, remain largely unexplored. In this study, we introduce ChromBERT, a BERT-based model specifically designed to detect distinct patterns of chromatin state annotation data sequences. Notably, ChromBERT was pre-trained on promoter regions across a diverse range of epigenomes and subsequently fine-tuned using a dataset from multiple cell lines where RNA-seq data were available, highlighting the model's ability to discern conserved chromatin state patterns within these regions. In addition to its predictive powers across tasks, evidenced by high AUC scores, ChromBERT provides further analysis through the incorporation of motif clustering using Dynamic Time Warping (DTW). This method enhances the model's ability to dissect chromatin state sequence motifs, typically involving transcription and enhancer sites. The introduction of motif clustering with DTW into ChromBERT's workflow is poised to facilitate the discovery of genomic regions linked to novel biological functions, deepening our understanding of chromatin state dynamics.
Bioinformatics
What problem does this paper attempt to address?