gsQTL: Associating genetic risk variants with gene sets by exploiting their shared variability

Gerard A. Bouland,Niccolò Tesi,Ahmed Mahfouz,Marcel J.T. Reinders
DOI: https://doi.org/10.1101/2024.09.13.612853
2024-09-16
Abstract:To investigate the functional significance of genetic risk loci identified through genome-wide association studies (GWASs), genetic loci are linked to genes based on their capacity to account for variation in gene expression, resulting in expression quantitative trait loci (eQTL). Following this, gene set analyses are commonly used to gain insights into functionality. However, the efficacy of this approach is hampered by small effect sizes and the burden of multiple testing. We propose an alternative approach: instead of examining the cumulative associations of individual genes within a gene set, we consider the collective variation of the entire gene set. We introduce the concept of gene set QTL (gsQTL), and show it to be more adept at identifying links between genetic risk variants and specific gene sets. Notably, gsQTL experiences less susceptibility to inflation or deflation of significant enrichments compared with conventional methods. Furthermore, we demonstrate the broader applicability of shared variability within gene sets. This is evident in scenarios such as the coordinated regulation of genes by a transcription factor or coordinated differential expression.
Bioinformatics
What problem does this paper attempt to address?
This paper aims to address the functional significance of genetic risk loci (often identified through genome-wide association studies, GWAS). Traditional methods identify expression quantitative trait loci (eQTL) by associating genetic loci with gene expression variation, followed by gene set analysis to gain functional insights. However, this approach is affected by small effect sizes and the burden of multiple testing. The paper proposes an alternative approach, the gene set QTL (gsQTL) method. Instead of examining the cumulative association of individual genes within a gene set, this method considers the collective variation of the entire gene set. The study shows that gsQTL is more effective in identifying the connection between genetic risk variants and specific gene sets, and it exhibits less significant enrichment inflation or deflation compared to traditional methods. Additionally, this method demonstrates the broad applicability of shared variation in different contexts, such as the coordination of transcription factor regulation and coordinated differential expression. In summary, the gsQTL method can better reveal associations between gene sets and genetic variation that traditional methods might overlook.