Mapping and characterization of structural variation in 17,795 deeply sequenced human genomes

Haley J. Abel,David E. Larson,Colby Chiang,Indraniel Das,Krishna L. Kanchi,Ryan M. Layer,Benjamin M. Neale,William J. Salerno,Catherine Reeves,Steven Buyske,Tara C. Matise,Donna M. Muzny,Michael C. Zody,Eric S. Lander,Susan K. Dutcher,Nathan O. Stitziel,Ira M. Hall,
DOI: https://doi.org/10.1101/508515
2018-12-31
Abstract:ABSTRACT A key goal of whole genome sequencing (WGS) for human genetics studies is to interrogate all forms of variation, including single nucleotide variants (SNV), small insertion/deletion (indel) variants and structural variants (SV). However, tools and resources for the study of SV have lagged behind those for smaller variants. Here, we used a cloud-based pipeline to map and characterize SV in 17,795 deeply sequenced human genomes from common disease trait mapping studies. We publicly release site-frequency information to create the largest WGS-based SV resource to date. On average, individuals carry 2.9 rare SVs that alter coding regions, which affect the dosage or structure of 4.2 genes and account for 4.0-11.2% of rare high-impact coding alleles. Based on a computational model, we estimate that SVs account for 17.2% of rare alleles genome-wide whose predicted deleterious effects are equivalent to loss-of-function (LoF) coding alleles; ~90% of such SVs are non-coding deletions (mean 19.1 per genome). We report 158,991 ultra-rare SVs and show that ~2% of individuals carry ultra-rare megabase-scale SVs, nearly half of which are balanced and/or complex rearrangements. Finally, we exploit this resource to infer the dosage sensitivity of genes and non-coding elements, revealing strong trends related to regulatory element class, conservation and cell-type specificity. This work will help guide SV analysis and interpretation in the era of WGS.
What problem does this paper attempt to address?