Research on Hi-C Data Enhancement Technology Based on Generative Adversarial Networks

Qian Bai,Zhe Cheng,Shipu Wang,Wei Zhou
DOI: https://doi.org/10.1088/1757-899x/782/5/052029
2020-01-01
IOP Conference Series Materials Science and Engineering
Abstract:Hi-C technology is one of the most popular tools for studying three-dimensional(3D) genome organization. Due to the high cost of sequencing, most Hi-C data have low resolution and cannot be used to connect distal regulatory elements to their target genes. To solve the problem that hi-c data of high resolution are not easy to obtain, this paper proposes a Hi-C enhancement method (HiCGAN) based on generative adversarial networks, Taking the down-sampling interaction matrix which is highly similar to the original matrix as input, only 1/16 of the original sequencing reading can be used to generate the Hi-C interaction matrix of high resolution. In the experiment, Pearson correlation coefficient was used to measure the similarity between the generated high-resolution matrix and the real high-resolution hi-c matrix in numerical distribution. the apparent interaction pairs were analyzed by Fit-Hi-C, and calling ChromHMM annotates state of 12 kinds of chromatin. Experimental results show that HiCGAN models learned in one cell type can predict high-resolution Hi-C matrices for other cell types. This study proposes a computational framework (HiCGAN) for accurately predicting Hi-C data improving the resolution of Hi-C data.
What problem does this paper attempt to address?