A high-resolution haplotype-resolved Reference panel constructed from the China Kadoorie Biobank Study

Canqing Yu,Xianmei Lan,Ye Tao,Yu Guo,Dianjianyi Sun,Puyi Qian,Yuwen Zhou,Robin Walters,Linxuan Li,Yunqing Zhu,Jingyu Zeng,Iona Y. Millwood,Ruidong Guo,Pei Pei,Tao Yang,Huaidong Du,Fan Yang,Ling Yang,Fangyi Ren,Yiping Chen,Fengzhen Chen,Xiaoying Jiang,Zhiqiang Ye,Lanlan Dai,Xiaofeng Wei,Xun Xu,Huanming Yang,Jian Wang,Zhengming Chen,Huanhuan Zhu,Jun Lv,Xin Jin,Liming Li
DOI: https://doi.org/10.1093/nar/gkad779
IF: 14.9
2023-01-01
Nucleic Acids Research
Abstract:Abstract Precision medicine depends on high-accuracy individual-level genotype data. However, the whole-genome sequencing (WGS) is still not suitable for gigantic studies due to budget constraints. It is particularly important to construct highly accurate haplotype reference panel for genotype imputation. In this study, we used 10 000 samples with medium-depth WGS to construct a reference panel that we named the CKB reference panel. By imputing microarray datasets, it showed that the CKB panel outperformed compared panels in terms of both the number of well-imputed variants and imputation accuracy. In addition, we have completed the imputation of 100 706 microarrays with the CKB panel, and the after-imputed data is the hitherto largest whole genome data of the Chinese population. Furthermore, in the GWAS analysis of real phenotype height, the number of tested SNPs tripled and the number of significant SNPs doubled after imputation. Finally, we developed an online server for offering free genotype imputation service based on the CKB reference panel (https://db.cngb.org/imputation/). We believe that the CKB panel is of great value for imputing microarray or low-coverage genotype data of Chinese population, and potentially mixed populations. The imputation-completed 100 706 microarray data are enormous and precious resources of population genetic studies for complex traits and diseases.
What problem does this paper attempt to address?