Yield of genetic association signals from genomes, exomes and imputation in the UK Biobank

Sheila M. Gaynor,Tyler Joseph,Xiaodong Bai,Yuxin Zou,Boris Boutkov,Evan K. Maxwell,Olivier Delaneau,Robin J. Hofmeister,Olga Krasheninina,Suganthi Balasubramanian,Anthony Marcketta,Joshua Backman,Jeffrey G. Reid,John D. Overton,Luca A. Lotta,Jonathan Marchini,William J. Salerno,Aris Baras,Goncalo R. Abecasis,Timothy A. Thornton
DOI: https://doi.org/10.1038/s41588-024-01930-4
IF: 30.8
2024-09-25
Nature Genetics
Abstract:Whole-genome sequencing (WGS), whole-exome sequencing (WES) and array genotyping with imputation (IMP) are common strategies for assessing genetic variation and its association with medically relevant phenotypes. To date, there has been no systematic empirical assessment of the yield of these approaches when applied to hundreds of thousands of samples to enable the discovery of complex trait genetic signals. Using data for 100 complex traits from 149,195 individuals in the UK Biobank, we systematically compare the relative yield of these strategies in genetic association studies. We find that WGS and WES combined with arrays and imputation (WES + IMP) have the largest association yield. Although WGS results in an approximately fivefold increase in the total number of assayed variants over WES + IMP, the number of detected signals differed by only 1% for both single-variant and gene-based association analyses. Given that WES + IMP typically results in savings of lab and computational time and resources expended per sample, we evaluate the potential benefits of applying WES + IMP to larger samples. When we extend our WES + IMP analyses to 468,169 UK Biobank individuals, we observe an approximately fourfold increase in association signals with the threefold increase in sample size. We conclude that prioritizing WES + IMP and large sample sizes rather than contemporary short-read WGS alternatives will maximize the number of discoveries in genetic association studies.
genetics & heredity
What problem does this paper attempt to address?