Genotype Imputation of MetabochipSNPs Using a Study‐Specific Reference Panel of ∼4,000 Haplotypes in African Americans from the Women's Health Initiative
Eric Yi Liu,Steven Buyske,Aaron K. Aragaki,Ulrike Peters,Eric Boerwinkle,Chris Carlson,Cara Carty,Dana C. Crawford,Jeff Haessler,Lucia A. Hindorff,Loic Le Marchand,Teri A. Manolio,Tara Matise,Wei Wang,Charles Kooperberg,Kari E. North,Yun Li
DOI: https://doi.org/10.1002/gepi.21603
2012-01-01
Abstract:Genetic imputation has become standard practice in modern genetic studies. However, several important issues have not been adequately addressed including the utility of study‐specific reference, performance in admixed populations, and quality for less common (minor allele frequency [MAF] 0.005–0.05) and rare (MAF < 0.005) variants. These issues only recently became addressable with genome‐wide association studies (GWAS) follow‐up studies using dense genotyping or sequencing in large samples of non‐European individuals. In this work, we constructed a study‐specific reference panel of 3,924 haplotypes using African Americans in the Women's Health Initiative (WHI) genotyped on both the Metabochip and the Affymetrix 6.0 GWAS platform. We used this reference panel to impute into 6,459 WHI SNP Health Association Resource (SHARe) study subjects with only GWAS genotypes. Our analysis confirmed the imputation quality metric Rsq (estimated r2, specific to each SNP) as an effective post‐imputation filter. We recommend different Rsq thresholds for different MAF categories such that the average (across SNPs) Rsq is above the desired dosage r2 (squared Pearson correlation between imputed and experimental genotypes). With a desired dosage r2 of 80%, 99.9% (97.5%, 83.6%, 52.0%, 20.5%) of SNPs with MAF > 0.05 (0.03–0.05, 0.01–0.03, 0.005–0.01, and 0.001–0.005) passed the post‐imputation filter. The average dosage r2 for these SNPs is 94.7%, 92.1%, 89.0%, 83.1%, and 79.7%, respectively. These results suggest that for African Americans imputation of Metabochip SNPs from GWAS data, including low frequency SNPs with MAF 0.005–0.05, is feasible and worthwhile for power increase in downstream association analysis provided a sizable reference panel is available.