Prioritizing disease-related rare variants by integrating gene expression data

Hanmin Guo,Alexander Eckehart Urban,Wing Hung Wong
DOI: https://doi.org/10.1371/journal.pgen.1011412
IF: 4.5
2024-10-01
PLoS Genetics
Abstract:Rare variants, comprising the vast majority of human genetic variations, are likely to have more deleterious impact in the context of human diseases compared to common variants. Here we present carrier statistic, a statistical framework to prioritize disease-related rare variants by integrating gene expression data. By quantifying the impact of rare variants on gene expression, carrier statistic can prioritize those rare variants that have large functional consequence in the patients. Through simulation studies and analyzing real multi-omics dataset, we demonstrated that carrier statistic is applicable in studies with limited sample size (a few hundreds) and achieves substantially higher sensitivity than existing rare variants association methods. Application to Alzheimer's disease reveals 16 rare variants within 15 genes with extreme carrier statistics. We also found strong excess of rare variants among the top prioritized genes in patients compared to that in healthy individuals. The carrier statistic method can be applied to various rare variant types and is adaptable to other omics data modalities, offering a powerful tool for investigating the molecular mechanisms underlying complex diseases. Existing rare variants association methods often lack statistical power when sample sizes are small. Here we propose a novel integrative statistical framework, the carrier statistic, which can leverage paired genotype and gene expression data to quantify the functional impact of rare variants and enhance detection power of rare variants responsible for disease. Extensive simulations demonstrate that carrier statistic provides well-calibrated false discovery rates, shows substantially higher sensitivity compared to existing methods, and remains robust under unbalanced case-control ratios. Through analyzing real multi-omics dataset for Alzheimer's disease, we identified 16 rare variants within 15 genes with extreme carrier statistics. We hope that the results presented in this paper can highlight the promise of the carrier statistic approach and will encourage future disease studies to collect both genotype and gene expression data for the same individuals. As multi-omics and genome sequencing data continue to expand, we anticipate that carrier statistics will be a valuable tool for elucidating the molecular mechanism underlying human complex diseases.
genetics & heredity
What problem does this paper attempt to address?