A genomic mutational constraint map using variation in 76,156 human genomes
Siwei Chen,Laurent C. Francioli,Julia K. Goodrich,Ryan L. Collins,Masahiro Kanai,Qingbo Wang,Jessica Alföldi,Nicholas A. Watts,Christopher Vittal,Laura D. Gauthier,Timothy Poterba,Michael W. Wilson,Yekaterina Tarasova,William Phu,Riley Grant,Mary T. Yohannes,Zan Koenig,Yossi Farjoun,Eric Banks,Stacey Donnelly,Stacey Gabriel,Namrata Gupta,Steven Ferriera,Charlotte Tolonen,Sam Novod,Louis Bergelson,David Roazen,Valentin Ruano-Rubio,Miguel Covarrubias,Christopher Llanwarne,Nikelle Petrillo,Gordon Wade,Thibault Jeandet,Ruchi Munshi,Kathleen Tibbetts,Anne O’Donnell-Luria,Matthew Solomonson,Cotton Seed,Alicia R. Martin,Michael E. Talkowski,Heidi L. Rehm,Mark J. Daly,Grace Tiao,Benjamin M. Neale,Daniel G. MacArthur,Konrad J. Karczewski,Maria Abreu,Carlos A. Aguilar Salinas,Tariq Ahmad,Christine M. Albert,Diego Ardissino,Irina M. Armean,Elizabeth G. Atkinson,Gil Atzmon,John Barnard,Samantha M. Baxter,Laurent Beaugerie,Emelia J. Benjamin,David Benjamin,Michael Boehnke,Lori L. Bonnycastle,Erwin P. Bottinger,Donald W. Bowden,Matthew J. Bown,Harrison Brand,Steven Brant,Ted Brookings,Sam Bryant,Sarah E. Calvo,Hannia Campos,John C. Chambers,Juliana C. Chan,Katherine R. Chao,Sinéad Chapman,Daniel I. Chasman,Rex Chisholm,Judy Cho,Rajiv Chowdhury,Mina K. Chung,Wendy K. Chung,Kristian Cibulskis,Bruce Cohen,Kristen M. Connolly,Adolfo Correa,Beryl B. Cummings,Dana Dabelea,John Danesh,Dawood Darbar,Phil Darnowsky,Joshua Denny,Ravindranath Duggirala,Josée Dupuis,Patrick T. Ellinor,Roberto Elosua,James Emery,Eleina England,Jeanette Erdmann,Tõnu Esko,Emily Evangelista,Diane Fatkin,Jose Florez,Andre Franke,Jack Fu,Martti Färkkilä,Kiran Garimella,Jeff Gentry,Gad Getz,David C. Glahn,Benjamin Glaser,Stephen J. Glatt,David Goldstein,Clicerio Gonzalez,Leif Groop,Sanna Gudmundsson,Andrea Haessly,Christopher Haiman,Ira Hall,Craig L. Hanis,Matthew Harms,Mikko Hiltunen,Matti M. Holi,Christina M. Hultman,Chaim Jalas,Mikko Kallela,Diane Kaplan,Jaakko Kaprio,Sekar Kathiresan,Eimear E. Kenny,Bong-Jo Kim,Young Jin Kim,Daniel King,George Kirov,Jaspal Kooner,Seppo Koskinen,Harlan M. Krumholz,Subra Kugathasan,Soo Heon Kwak,Markku Laakso,Nicole Lake,Trevyn Langsford,Kristen M. Laricchia,Terho Lehtimäki,Monkol Lek,Emily Lipscomb,Ruth J. F. Loos,Wenhan Lu,Steven A. Lubitz,Teresa Tusie Luna,Ronald C. W. Ma,Gregory M. Marcus,Jaume Marrugat,Kari M. Mattila,Steven McCarroll,Mark I. McCarthy,Jacob L. McCauley,Dermot McGovern,Ruth McPherson,James B. Meigs,Olle Melander,Andres Metspalu,Deborah Meyers,Eric V. Minikel,Braxton D. Mitchell,Vamsi K. Mootha,Aliya Naheed,Saman Nazarian,Peter M. Nilsson,Michael C. O’Donovan,Yukinori Okada,Dost Ongur,Lorena Orozco,Michael J. Owen,Colin Palmer,Nicholette D. Palmer,Aarno Palotie,Kyong Soo Park,Carlos Pato,Ann E. Pulver,Dan Rader,Nazneen Rahman,Alex Reiner,Anne M. Remes,Dan Rhodes,Stephen Rich,John D. Rioux,Samuli Ripatti,Dan M. Roden,Jerome I. Rotter,Nareh Sahakian,Danish Saleheen,Veikko Salomaa,Andrea Saltzman,Nilesh J. Samani,Kaitlin E. Samocha,Alba Sanchis-Juan,Jeremiah Scharf,Molly Schleicher,Heribert Schunkert,Sebastian Schönherr,Eleanor G. Seaby,Svati H. Shah,Megan Shand,Ted Sharpe,Moore B. Shoemaker,Tai Shyong,Edwin K. Silverman,Moriel Singer-Berk,Pamela Sklar,Jonathan T. Smith,J. Gustav Smith,Hilkka Soininen,Harry Sokol,Rachel G. Son,Jose Soto,Tim Spector,Christine Stevens,Nathan O. Stitziel,Patrick F. Sullivan,Jaana Suvisaari,E. Shyong Tai,Kent D. Taylor,Yik Ying Teo,Ming Tsuang,Tiinamaija Tuomi,Dan Turner,Teresa Tusie-Luna,Erkki Vartiainen,Marquis Vawter,Lily Wang,Arcturus Wang,James S. Ware,Hugh Watkins,Rinse K. Weersma,Ben Weisburd,Maija Wessman,Nicola Whiffin,James G. Wilson,Ramnik J. Xavier,Genome Aggregation Database Consortium
DOI: https://doi.org/10.1038/s41586-023-06045-0
IF: 64.8
2023-12-07
Nature
Abstract:The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders 1,2,3,4 , but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)—the largest public open-access human genome allele frequency reference dataset—and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.
multidisciplinary sciences