Improving comparative analyses of Hi-C data via contrastive self-supervised learning

Han Li,Xuan He,Lawrence Kurowski,Ruotian Zhang,Dan Zhao,Jianyang Zeng
DOI: https://doi.org/10.1093/bib/bbad193
IF: 9.5
2023-06-09
Briefings in Bioinformatics
Abstract:Hi-C is a widely applied chromosome conformation capture (3C)-based technique, which has produced a large number of genomic contact maps with high sequencing depths for a wide range of cell types, enabling comprehensive analyses of the relationships between biological functionalities (e.g. gene regulation and expression) and the three-dimensional genome structure. Comparative analyses play significant roles in Hi-C data studies, which are designed to make comparisons between Hi-C contact maps, thus evaluating the consistency of replicate Hi-C experiments (i.e. reproducibility measurement) and detecting statistically differential interacting regions with biological significance (i.e. differential chromatin interaction detection). However, due to the complex and hierarchical nature of Hi-C contact maps, it remains challenging to conduct systematic and reliable comparative analyses of Hi-C data. Here, we proposed sslHiC, a contrastive self-supervised representation learning framework, for precisely modeling the multi-level features of chromosome conformation and automatically producing informative feature embeddings for genomic loci and their interactions to facilitate comparative analyses of Hi-C contact maps. Comprehensive computational experiments on both simulated and real datasets demonstrated that our method consistently outperformed the state-of-the-art baseline methods in providing reliable measurements of reproducibility and detecting differential interactions with biological meanings.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?