A deep catalogue of protein-coding variation in 983,578 individuals
Kathie Y. Sun,Xiaodong Bai,Siying Chen,Suying Bao,Chuanyi Zhang,Manav Kapoor,Joshua Backman,Tyler Joseph,Evan Maxwell,George Mitra,Alexander Gorovits,Adam Mansfield,Boris Boutkov,Sujit Gokhale,Lukas Habegger,Anthony Marcketta,Adam E. Locke,Liron Ganel,Alicia Hawes,Michael D. Kessler,Deepika Sharma,Jeffrey Staples,Jonas Bovijn,Sahar Gelfman,Alessandro Di Gioia,Veera M. Rajagopal,Alexander Lopez,Jennifer Rico Varela,Jesus Alegre,Jaime Berumen,Roberto Tapia-Conyer,Pablo Kuri-Morales,Jason Torres,Jonathan Emberson,Rory Collins,Michael Cantor,Timothy Thornton,Hyun Min Kang,John D. Overton,Alan R. Shuldiner,M. Laura Cremona,Mona Nafde,Aris Baras,Goncalo Abecasis,Jonathan Marchini,Jeffrey G. Reid,William Salerno,Suganthi Balasubramanian
DOI: https://doi.org/10.1038/s41586-024-07556-0
IF: 64.8
2024-05-21
Nature
Abstract:Rare coding variants that significantly impact function provide insights into the biology of a gene 1-3 . However, ascertaining their frequency requires large sample sizes 4-8 . Here, we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. 23% of the Regeneron Genetics Center Million Exome data (RGC-ME) comes from non-European individuals of African, East Asian, Indigenous American, Middle Eastern, and South Asian ancestry. This catalogue includes over 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss-of-function, we identify 3,988 loss-of-function intolerant genes, including 86 that were previously assessed as tolerant and 1,153 lacking established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions depleted of missense variants despite being tolerant to pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this important resource of coding variation from the RGC-ME accessible via a public variant allele frequency browser.
multidisciplinary sciences