Uncovering topologically associating domains from three-dimensional genome maps with TADGATE

Dachang Dang,Shao-Wu Zhang,Kangning Dong,Ran Duan,Shihua Zhang
DOI: https://doi.org/10.1101/2024.06.12.598668
2024-06-14
Abstract:Topologically associating domains (TADs) emerge as indispensable units in three-dimensional (3D) genome organization, playing a critical role in gene regulation. However, accurately identifying TADs from sparse chromatin contact maps and exploring the structural and functional elements within TADs remain challenging. To this end, we develop a graph attention auto-encoder, TADGATE, to accurately identify TADs even from ultra-sparse contact maps and generate the imputed maps while preserving or enhancing the underlying topological structures. TADGATE can capture specific attention patterns, pointing to two types of units with different characteristics in TADs. Moreover, we find that the organization of TADs is closely associated with chromatin compartmentalization, and TAD boundaries in different compartmental environments exhibit distinct biological properties. We also utilize a two-layer Hidden Markov Model to functionally annotate the TADs and their internal regions, revealing the overall properties of TADs and the distribution of the structural and functional elements within TADs. At last, we apply TADGATE to highly sparse and noisy Hi-C contact maps from 21 human tissues or cell lines, enhancing the clarity of TAD structures, investigating the nature of conserved and cell type-specific boundaries, and unveiling the cell type-specific transcriptional regulatory mechanisms associated with topological domains.
Bioinformatics
What problem does this paper attempt to address?
The paper attempts to address the following issues: 1. **Accurately identifying Topologically Associating Domains (TADs) from sparse chromatin contact maps**: Although TADs play a crucial role in gene regulation, accurately identifying them from sparse and noisy Hi-C data remains a challenge. Researchers have developed various methods to identify TADs, but they perform poorly when dealing with data of low sequencing depth. 2. **Generating completed contact maps while preserving or enhancing the underlying topological structure**: By developing a graph attention autoencoder (TADGATE), this method not only accurately identifies TADs but also smooths and completes sparse Hi-C contact maps, thereby improving the clarity of structure recognition. 3. **Exploring different types of TAD boundaries and their biological characteristics**: The paper finds that TAD organization is closely related to chromatin compartmentalization and classifies TAD boundaries into different types based on different compartment environments. It explores the conservation and specific transcriptional regulatory mechanisms of these boundaries in different cell types. 4. **Functionally annotating TADs and their internal regions**: Using a two-layer hidden Markov model, the paper functionally annotates TADs and their internal structures and functional elements, revealing the overall properties of TADs and the distribution of structural and functional elements within TADs. In summary, this paper aims to propose a new computational framework, TADGATE, to overcome the limitations of existing methods in handling sparse Hi-C data and to deeply explore the functional characteristics of TAD boundaries and their internal elements.