T2T-YAO: A Telomere-to-telomere Assembled Diploid Reference Genome for Han Chinese

Yukun He,Yanan Chu,Shuming Guo,Jiang Hu,Ran Li,Yali Zheng,Xinqian Ma,Zhenglin Du,Lili Zhao,Wenyi Yu,Jianbo Xue,Wenjie Bian,Feifei Yang,Xi Chen,Pingan Zhang,Rihan Wu,Yifan Ma,Changjun Shao,Jing Chen,Jian Wang,Jiwei Li,Jing Wu,Xiaoyi Hu,Qiuyue Long,Mingzheng Jiang,Hongli Ye,Shixu Song,Guangyao Li,Yue Wei,Yu Xu,Yanliang Ma,Yanwen Chen,Keqiang Wang,Jing Bao,Wen Xi,Fang Wang,Wentao Ni,Moqin Zhang,Yan Yu,Shengnan Li,Yu Kang,Zhancheng Gao
DOI: https://doi.org/10.1016/j.gpb.2023.08.001
2023-08-18
Genomics, Proteomics and Bioinformatics
Abstract:Since its initial release in 2001, the human reference genome has undergone continuous improvement in quality, and the recently released telomere-to-telomere version—T2T-CHM13—reaches its highest level of continuity and accuracy after 20 years of effort by working on a simplified, nearly homozygous genome of a hydatidiform mole cell line. To provide an authentic complete diploid human genome reference for the Han Chinese, the largest population in the world, we have assembled the genome of a male Han Chinese individual, T2T-YAO, which includes telomere-to-telomere assemblies of all the 22+X+M and 22+Y chromosomes in both haploid. The quality of T2T-YAO is much better than all currently available diploid assemblies, and its haploid version, T2T-YAO-hp, generated by selecting the better assembly for each autosome, reaches the top quality of fewer than one error per 29.5 Mb, even higher than that of T2T-CHM13. Derived from an individual living in the aboriginal region of the Han population, T2T-YAO shows clear ancestry and potential genetic continuity from the ancient ancestors. Each haplotype of T2T-YAO possesses ∼330 Mb exclusive sequences, ∼3100 unique genes, and tens of thousands of nucleotide and structural variations as compared to CHM13, highlighting the necessity of population-stratified reference genome. The construction of T2T-YAO, a truly accurate and authentic representative of the Chinese population, would enable precise delineation of genomic variations and advance our understandings in the hereditability of diseases and phenotypes, especially within the context of the unique variations of the Chinese population.
genetics & heredity
What problem does this paper attempt to address?