The Diploid Genome Sequence of an Asian Individual

Jun Wang,Wei Wang,Ruiqiang Li,Yingrui Li,Geng Tian,Laurie Goodman,Wei Fan,Junqing Zhang,Jun Li,Juanbin Zhang,Yiran Guo,Binxiao Feng,Heng Li,Yao Lu,Xiaodong Fang,Huiqing Liang,Zhenglin Du,Dong Li,Yiqing Zhao,Yujie Hu,Zhenzhen Yang,Hancheng Zheng,Ines Hellmann,Michael Inouye,John Pool,Xin Yi,Jing Zhao,Jinjie Duan,Yan Zhou,Junjie Qin,Lijia Ma,Guoqing Li,Zhentao Yang,Guojie Zhang,Bin Yang,Chang Yu,Fang Liang,Wenjie Li,Shaochuan Li,Dawei Li,Peixiang Ni,Jue Ruan,Qibin Li,Hongmei Zhu,Dongyuan Liu,Zhike Lu,Ning Li,Guangwu Guo,Jianguo Zhang,Jia Ye,Lin Fang,Qin Hao,Quan Chen,Yu Liang,Yeyang Su,A. san,Cuo Ping,Shuang Yang,Fang Chen,Li,Ke Zhou,Hongkun Zheng,Yuanyuan Ren,Ling Yang,Yang Gao,Guohua Yang,Zhuo Li,Xiaoli Feng,Karsten Kristiansen,Gane Ka-Shu Wong,Rasmus Nielsen,Richard Durbin,Lars Bolund,Xiuqing Zhang,Songgang Li,Huanming Yang,Jian Wang
DOI: https://doi.org/10.1038/nature07484
IF: 64.8
2008-01-01
Nature
Abstract:Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual’s genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics.
What problem does this paper attempt to address?