Abstract:Here, 622 imputations were conducted with 394 customized reference panels for Han Chinese and European populations. Besides validating the fact that the imputation accuracy could always benefit from the increased panel size when the reference panel was population-specific, the results brought two new thoughts as follows. First, when the haplotype size of reference panel was fixed, the imputation accuracy of common and low-frequency variants (MAF>0.5%) decreased while the population-diversity of reference panel increased, but for rare variants (MAF<0.5%), a fraction of diversity (<20%) of panel could improve the imputation accuracy. Second, when the haplotype size of reference panel was increased with extra population-diverse samples, the imputation accuracy of common variants (MAF>5%) for European population could always benefit from the expanding sample size. But for Han Chinese population, the accuracy of all imputed variants reached the highest when reference panel contained a fraction of extra diverse sample (15%∼21%). In addition, we evaluated the existing reference panels such as the HRC and 1000G Phase3 and CONVERGE. For European population, HRC was the best reference panel. For Han Chinese population, we proposed an optimum constituent ratio for the Han Chinese imputation if researchers would like to customize their own sequenced reference panel, but a high quality and large-scale Chinese reference panel was still needed. Our findings could be generalized to the other populations with conservative genome, a tool was provided to investigate other populations of interest (https://github.com/Abyss-bai/reference-panel-reconstruction). Highlights (Key points) A total of 394 reference panels were designed and customized by three strategies, and large-scale genotype imputations were performed with these panels for systematic evaluation in Han Chinese and European populations. The accuracy of imputed variants reached the highest when reference panel contains a fraction of extra diverse sample (15%∼21%) for Han Chinese population, if the haplotype size of the reference panel was increased with extra samples, which is the most common cases. The imputation accuracy showed the different trends between Han Chinese and European populations. In a sense, the European genome may more diverse than Han Chinese genome by itself. Existing reference panels were not the best choice for Chinese imputation, a high quality and large-scale Chinese reference panel was still needed.

Performance of Genotype Imputation for Low Frequency and Rare Variants from the 1000 Genomes

A Combined Reference Panel from the 1000 Genomes and Uk10k Projects Improved Rare Variant Imputation in European and Chinese Samples

Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

Genotype Imputation and Reference Panel: a Systematic Evaluation on Haplotype Size and Diversity.

Genotype Imputation of MetabochipSNPs Using a Study‐Specific Reference Panel of ∼4,000 Haplotypes in African Americans from the Women's Health Initiative

Genotype Imputation and Reference Panel: A Systematic Evaluation

Additional File 1 of Ultra-low-coverage Genome-Wide Association Study—insights into Gestational Age Using 17,844 Embryo Samples with Preimplantation Genetic Testing

A New Genotype Imputation Method with Tolerance to High Missing Rate and Rare Variants

MaCH-admix: Genotype Imputation for Admixed Populations.

Genotype imputation accuracy and the quality metrics of the minor ancestry in multi-ancestry reference panels

Implication of Next-Generation Sequencing on Association Studies

Comparison Of Hapmap And 1000 Genomes Reference Panels In A Large-Scale Genome-Wide Association Study

A Novel Efficient Algorithm for Common Variants Genotyping from Low-Coverage Sequencing Data

Benchmarking Imputed Low Coverage Genomes in a Human Population Genetics Context

Empirical versus estimated accuracy of imputation: optimising filtering thresholds for sequence imputation

Rapid and accurate genotype imputation from low coverage short read, long read, and cell free DNA sequence

MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes

A high-resolution haplotype-resolved Reference panel constructed from the China Kadoorie Biobank Study

Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets

MagicalRsq-X: A cross-cohort transferable genotype imputation quality metric

High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation