High-resolution Diploid 3D Genome Reconstruction Using Pore-C Data

Ying Chen,Zhuo-Bin Lin,Shao-Kai Wang,Bo Wu,Long-Jian Niu,Jia-Yong Zhong,Yi-Meng Sun,Xin Bai,Luo-Ran Liu,Wei Xie,Ruibang Luo,Chunhui Hou,Feng Luo,Chuan-Le Xiao
DOI: https://doi.org/10.1101/2023.08.29.555243
2023-01-01
Abstract:In diploid organisms, spatial variations between homologous chromosomes are essential to many biological phenomena. Currently, it is still challenging to efficiently reconstruct a high-quality diploid 3D human genome. Here, we introduce Dip3D, reconstructing the diploid 3D human genome using Pore-C data of one sample. Dip3D has solved multiple problems in genome-wide SNV calling and haplo-tagging caused by the high sequencing error rates in Pore-C type data. Dip3D capitalizes on the high-order chromosomal interaction characteristics, enabling robust haplotype imputation and intricate haplotype-specific 3D structure discovery. Dip3D outperforms previous methods in data utilization rate, contact matrix resolution, and completeness by one order of magnitude. Moreover, Dip3D allows capturing haplotype high-order interactions that are unseen in Hi-C type data. We demonstrated the identified haplotype substructures such as Topologically Associating Domains (TADs) in the constructed 3D human genome, and unraveled connections between genic haplotype-specific high-order interactions and imbalanced allelic expression. ### Competing Interest Statement The authors have declared no competing interest. The in-house Pore-C data of HG001, HG002 and F1 mice have been deposited in the Genome Sequence Archive (GSA) at the BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences (BIG, ), under the Project Accession No. “PRJCA018069”. The GSA-Human and GSA Accession Nos. for the data are “HRA004983” and “CRA011676”, respectively. The Pore-C model for Clair3 is available at [http://www.bio8.cs.hku.hk/porec/clair3\_porec\_model.zip][1]. The ONT data of HG001 and HG002 are available at [https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=NHGRI\_UCSC\_panel][2]. The GIAB v3.3.2 ‘truth’ VCFs of HG001 and HG002 were obtained from and . The region stratifications of GRCh38 reference genome are available at . The processed Hi-C data of HG001 were obtained from Rao et al. (2014), including diploid Hi-C matrices (GSE63525) and 531× alignment files (). The SNV VCF files for the parental strains (C57B/6J and PWK/PhJ) of the F1 mice are available from the Mouse Genomes Project ([https://ftp.ebi.ac.uk/pub/databases/mousegenomes/REL-2112-v8-SNPs\_Indels/mgp\_REL2021_snps.vcf.gz][3]). Diploid HG001 Hi-C matrices from Rao et al.4 is available from GSE63525. The Hi-C data of C57B/6J × PWK/PhJ mice are available in NCBI SRA database under SRA accession IDs SRR5122741 and SRR5122742. [1]: http://www.bio8.cs.hku.hk/porec/clair3_porec_model.zip [2]: https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=NHGRI_UCSC_panel [3]: https://ftp.ebi.ac.uk/pub/databases/mousegenomes/REL-2112-v8-SNPs_Indels/mgp_REL2021_snps.vcf.gz
What problem does this paper attempt to address?