A High-resolution Haplotype-resolved Reference Panel Constructed from the China Kadoorie Biobank Study
Canqing Yu,Xianmei Lan,Ye Tao,Yu Guo,Dianjianyi Sun,Puyi Qian,Yuwen Zhou,Robin Walters,Linxuan Li,Iona Millwood,Jingyu Zeng,Pei Pei,Ruidong Guo,Huaidong Du,Tao Yang,Ling Yang,Fan Yang,Yiping Chen,Fengzhen Chen,Xiaosen Jiang,Zhiqiang Ye,Fangyi Ren,Lanlan Dai,Xiaofeng Wei,Xun Xu,Huanming Yang,Jian Wang,Zhengming Chen,Huanhuan Zhu,Jun Lv,Xin Jin,Liming Li
DOI: https://doi.org/10.1101/2022.12.14.22283491
IF: 14.9
2022-01-01
Nucleic Acids Research
Abstract:Precision medicine relies on high-accuracy individual-level genotype data. However, the whole-genome sequencing (WGS) is currently not suitable for studies with very large sample sizes due to budget constraints. It is particularly important to construct highly accurate haplotype reference panel for genotype imputation. In this study, we selected 9,950 individuals from the China Kadoorie Biobank (CKB) cohort and 50 Chinese samples from the 1000 Genome Project (1KGP) for medium-depth WGS to construct a CKB reference panel. The results of imputing microarray datasets showed that the CKB panel outperformed the extended high coverage 1KGP, TOPMed, ChinaMAP, and NuyWa panels in terms of both the number of well-imputed variants and imputation accuracy. In addition, we have completed the imputation of over 100,000 CKB microarray data with the CKB panel, and the after-imputed genotype data is the hitherto largest whole genome data of the Chinese population. Finally, we developed an online server for offering free genotype imputation service based on the CKB reference panel (<https://db.cngb.org/imputation/>). We believe that the constructed CKB reference panel is of great value for imputing microarray or low-depth genotype data of Chinese population. The imputation-completed 100,000 microarray data are fundamental resources of population genetic studies for complex traits and diseases in the Chinese population.
### Competing Interest Statement
The authors have declared no competing interest.
### Funding Statement
This work was supported by grants (2016YFC0900500) from the National Key R&D Program of China, National Natural Science Foundation of China (32000398, 82192901, 82192904, 82192900), the China National GeneBank, Guangdong Provincial Key Laboratory of Genome Read and Write (2017B030301011) and Guangdong Provincial Academician Workstation of BGI Synthetic Genomics (2017B090904014). The CKB baseline survey and the first re-survey were supported by a grant from the Kadoorie Charitable Foundation in Hong Kong. The long-term follow-up is supported by grants from the UK Wellcome Trust (212946/Z/18/Z, 202922/Z/16/Z, 104085/Z/14/Z, 088158/Z/09/Z), National Natural Science Foundation of China (81390540, 91846303, 81941018), and Chinese Ministry of Science and Technology (2011BAI09B01). The funders had no role in the study design, data collection, data analysis and interpretation, writing of the report, or the decision to submit the article for publication.
### Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Ethics committee/IRB of Beijing Genomics Institute(BGI) gave ethical approval for this work
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
The CKB reference panel and the after-imputed over 100,000 CKB microarray data have been deposited into CNGB Sequence Archive (CNSA) of China National GeneBank DataBase (CNGBdb) with accession number CNP0003405. All genotype data are shared with controlled management.All data produced are available online at
<https://db.cngb.org/cnsa/>