The landscape of gene loss and missense variation across the mammalian tree informs on gene essentiality

Calwing Liao,Robert Ye,Franjo Ivankovic,Jack Fu,Raymond K Walters,Chelsea Lowther,Elise Valkanas,Claire Churchhouse,Kaitlin E Samocha,Kerstin Lindblad-Toh,Elinor Karlsson,Michael Hiller,Michael E Talkowski,Benjamin Neale
DOI: https://doi.org/10.1101/2024.05.16.594531
2024-05-19
Abstract:Background: The degree of gene and sequence preservation across species provides valuable insights into the relative necessity of genes from the perspective of natural selection. Here, we developed novel interspecies metrics across 462 mammalian species, GISMO (Gene identity score of mammalian orthologs) and GISMO-mis (GISMO-missense), to quantify gene loss traversing millions of years of evolution. GISMO is a measure of gene loss across mammals weighed by evolutionary distance relative to humans, whereas GISMO-mis quantifies the ratio of missense to synonymous variants across mammalian species for a given gene. Rationale: Despite large sample sizes, current human constraint metrics are still not well calibrated for short genes. Traversing over 100 million years of evolution across hundreds of mammals can identify the most essential genes and improve gene-disease association. Beyond human genetics, these metrics provide measures of gene constraint to further enable mammalian genetics research. Results: Our analyses showed that both metrics are strongly correlated with measures of human gene constraint for loss-of-function, missense, and copy number dosage derived from upwards of a million human samples, which highlight the power of interspecies constraint. Importantly, neither GISMO nor GISMO-mis are strongly correlated with coding sequence length. Therefore both metrics can identify novel constrained genes that were too small for existing human constraint metrics to capture. We also found that GISMO scores capture rare variant association signals across a range of phenotypes associated with decreased fecundity, such as schizophrenia, autism, and neurodevelopmental disorders. Moreover, common variant heritability of disease traits are highly enriched in the most constrained deciles of both metrics, further underscoring the biological relevance of these metrics in identifying functionally important genes. We further showed that both scores have the lowest duplication and deletion rate in the most constrained deciles for copy number variants in the UK Biobank, suggesting that it may be an important metric for dosage sensitivity. We additionally demonstrate that GISMO can improve prioritization of recessive disorder genes and captures homozygous selection. Conclusions Overall, we demonstrate that the most constrained genes for gene loss and missense variation capture the largest fraction of heritability, GISMO can help prioritize recessive disorder genes, and identify the most conserved genes across the mammalian tree.
Genetics
What problem does this paper attempt to address?