CREATE: cell-type-specific cis-regulatory elements identification via discrete embedding

Xuejian Cui,Qijin Yin,Zijing Gao,Zhen Li,Xiaoyang Chen,Shengquan Chen,Qiao Liu,Wanwen Zeng,Rui Jiang
DOI: https://doi.org/10.1101/2024.10.02.616391
2024-10-03
Abstract:Identifying cis-regulatory elements (CREs) within non-coding genomic regions-such as enhancers, silencers, promoters, and insulators-is pivotal for elucidating the intricate gene regulatory mechanisms underlying complex biological traits. The current prevalent sequence-based methods often focus on singular CRE types, limiting insights into cell-type-specific biological implications. Here, we introduce CREATE, a multimodal deep learning model based on the Vector Quantized Variational AutoEncoder framework, designed to extract discrete CRE embeddings and classify multiple CRE classes using genomic sequences, chromatin accessibility, and chromatin interaction data. CREATE excels in accurate CRE identification and exhibits strong effectiveness and robustness. We showcase CREATE's capability in generating comprehensive CRE-specific feature spectrum, offering quantitative and interpretable insights into CRE specificity. By enabling large-scale prediction of CREs in specific cell types, CREATE facilitates the recognition of disease- or phenotype-related biological variabilities of CREs, thereby expanding our understanding of gene regulation landscapes.
Bioinformatics
What problem does this paper attempt to address?