Siwei Chen,Laurent C. Francioli,Julia K. Goodrich,Ryan L. Collins,Masahiro Kanai,Qingbo Wang,Jessica Alföldi,Nicholas A. Watts,Christopher Vittal,Laura D. Gauthier,Timothy Poterba,Michael W. Wilson,Yekaterina Tarasova,William Phu,Riley Grant,Mary T. Yohannes,Zan Koenig,Yossi Farjoun,Eric Banks,Stacey Donnelly,Stacey Gabriel,Namrata Gupta,Steven Ferriera,Charlotte Tolonen,Sam Novod,Louis Bergelson,David Roazen,Valentin Ruano-Rubio,Miguel Covarrubias,Christopher Llanwarne,Nikelle Petrillo,Gordon Wade,Thibault Jeandet,Ruchi Munshi,Kathleen Tibbetts,Anne O’Donnell-Luria,Matthew Solomonson,Cotton Seed,Alicia R. Martin,Michael E. Talkowski,Heidi L. Rehm,Mark J. Daly,Grace Tiao,Benjamin M. Neale,Daniel G. MacArthur,Konrad J. Karczewski,Maria Abreu,Carlos A. Aguilar Salinas,Tariq Ahmad,Christine M. Albert,Diego Ardissino,Irina M. Armean,Elizabeth G. Atkinson,Gil Atzmon,John Barnard,Samantha M. Baxter,Laurent Beaugerie,Emelia J. Benjamin,David Benjamin,Michael Boehnke,Lori L. Bonnycastle,Erwin P. Bottinger,Donald W. Bowden,Matthew J. Bown,Harrison Brand,Steven Brant,Ted Brookings,Sam Bryant,Sarah E. Calvo,Hannia Campos,John C. Chambers,Juliana C. Chan,Katherine R. Chao,Sinéad Chapman,Daniel I. Chasman,Rex Chisholm,Judy Cho,Rajiv Chowdhury,Mina K. Chung,Wendy K. Chung,Kristian Cibulskis,Bruce Cohen,Kristen M. Connolly,Adolfo Correa,Beryl B. Cummings,Dana Dabelea,John Danesh,Dawood Darbar,Phil Darnowsky,Joshua Denny,Ravindranath Duggirala,Josée Dupuis,Patrick T. Ellinor,Roberto Elosua,James Emery,Eleina England,Jeanette Erdmann,Tõnu Esko,Emily Evangelista,Diane Fatkin,Jose Florez,Andre Franke,Jack Fu,Martti Färkkilä,Kiran Garimella,Jeff Gentry,Gad Getz,David C. Glahn,Benjamin Glaser,Stephen J. Glatt,David Goldstein,Clicerio Gonzalez,Leif Groop,Sanna Gudmundsson,Andrea Haessly,Christopher Haiman,Ira Hall,Craig L. Hanis,Matthew Harms,Mikko Hiltunen,Matti M. Holi,Christina M. Hultman,Chaim Jalas,Mikko Kallela,Diane Kaplan,Jaakko Kaprio,Sekar Kathiresan,Eimear E. Kenny,Bong-Jo Kim,Young Jin Kim,Daniel King,George Kirov,Jaspal Kooner,Seppo Koskinen,Harlan M. Krumholz,Subra Kugathasan,Soo Heon Kwak,Markku Laakso,Nicole Lake,Trevyn Langsford,Kristen M. Laricchia,Terho Lehtimäki,Monkol Lek,Emily Lipscomb,Ruth J. F. Loos,Wenhan Lu,Steven A. Lubitz,Teresa Tusie Luna,Ronald C. W. Ma,Gregory M. Marcus,Jaume Marrugat,Kari M. Mattila,Steven McCarroll,Mark I. McCarthy,Jacob L. McCauley,Dermot McGovern,Ruth McPherson,James B. Meigs,Olle Melander,Andres Metspalu,Deborah Meyers,Eric V. Minikel,Braxton D. Mitchell,Vamsi K. Mootha,Aliya Naheed,Saman Nazarian,Peter M. Nilsson,Michael C. O’Donovan,Yukinori Okada,Dost Ongur,Lorena Orozco,Michael J. Owen,Colin Palmer,Nicholette D. Palmer,Aarno Palotie,Kyong Soo Park,Carlos Pato,Ann E. Pulver,Dan Rader,Nazneen Rahman,Alex Reiner,Anne M. Remes,Dan Rhodes,Stephen Rich,John D. Rioux,Samuli Ripatti,Dan M. Roden,Jerome I. Rotter,Nareh Sahakian,Danish Saleheen,Veikko Salomaa,Andrea Saltzman,Nilesh J. Samani,Kaitlin E. Samocha,Alba Sanchis-Juan,Jeremiah Scharf,Molly Schleicher,Heribert Schunkert,Sebastian Schönherr,Eleanor G. Seaby,Svati H. Shah,Megan Shand,Ted Sharpe,Moore B. Shoemaker,Tai Shyong,Edwin K. Silverman,Moriel Singer-Berk,Pamela Sklar,Jonathan T. Smith,J. Gustav Smith,Hilkka Soininen,Harry Sokol,Rachel G. Son,Jose Soto,Tim Spector,Christine Stevens,Nathan O. Stitziel,Patrick F. Sullivan,Jaana Suvisaari,E. Shyong Tai,Kent D. Taylor,Yik Ying Teo,Ming Tsuang,Tiinamaija Tuomi,Dan Turner,Teresa Tusie-Luna,Erkki Vartiainen,Marquis Vawter,Lily Wang,Arcturus Wang,James S. Ware,Hugh Watkins,Rinse K. Weersma,Ben Weisburd,Maija Wessman,Nicola Whiffin,James G. Wilson,Ramnik J. Xavier,Genome Aggregation Database Consortium

Abstract:The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders 1,2,3,4 , but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)—the largest public open-access human genome allele frequency reference dataset—and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.

Curated variation benchmarks for challenging medically relevant autosomal genes

The Platinum Pedigree: A long-read benchmark for genetic variants

Unveiling novel genetic variants in 370 challenging medically relevant genes using the long read sequencing data of 41 samples from 19 global populations

Characterizing the Genetic Polymorphisms in 370 Challenging Medically Relevant Genes Using Long-Read Sequencing Data from 41 Human Individuals among 19 Global Populations

A robust benchmark for detecting low-frequency variants in the HG002 Genome In A Bottle NIST reference material.

Assessing structural variation in a personal genome—towards a human reference diploid genome

Closing the gap: Solving complex medically relevant genes at scale

Analysis of protein-coding genetic variation in 60,706 humans

Benchmarking of Germline Copy Number Variant Callers from Whole Genome Sequencing Data for Clinical Applications

Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery

Analysis and benchmarking of small and large genomic variants across tandem repeats

A deep catalogue of protein-coding variation in 983,578 individuals

Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project

Rare coding variant analysis for human diseases across biobanks and ancestries

A genomic mutational constraint map using variation in 76,156 human genomes

Rare copy number variants in over 100,000 European ancestry subjects reveal multiple disease associations

An Integrated Map of Structural Variation in 2,504 Human Genomes.

Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation

High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation

Identification of copy number variants in whole-genome data using Reference Coverage Profiles

The benefit of a complete reference genome for cancer structural variant analysis