Discovering Candidate Genes Regulated by GWAS Signals in Cis and Trans
Samhita Pal,Xinge Jessie Jeng
2024-08-31
Abstract:Understanding the genetic underpinnings of complex traits and diseases has been greatly advanced by genome-wide association studies (GWAS). However, a significant portion of trait heritability remains unexplained, known as ``missing heritability". Most GWAS loci reside in non-coding regions, posing challenges in understanding their functional impact. Integrating GWAS with functional genomic data, such as expression quantitative trait loci (eQTLs), can bridge this gap. This study introduces a novel approach to discover candidate genes regulated by GWAS signals in both cis and trans. Unlike existing eQTL studies that focus solely on cis-eQTLs or consider cis- and trans-QTLs separately, we utilize adaptive statistical metrics that can reflect both the strong, sparse effects of cis-eQTLs and the weak, dense effects of trans-eQTLs. Consequently, candidate genes regulated by the joint effects can be prioritized. We demonstrate the efficiency of our method through theoretical and numerical analyses and apply it to adipose eQTL data from the METabolic Syndrome in Men (METSIM) study, uncovering genes playing important roles in the regulatory networks influencing cardiometabolic traits. Our findings offer new insights into the genetic regulation of complex traits and present a practical framework for identifying key regulatory genes based on joint eQTL effects.
Genomics,Methodology
What problem does this paper attempt to address?
### Problems the paper attempts to solve
This paper aims to address several key challenges in genomics research:
1. **The problem of missing heritability**: Although genome - wide association studies (GWAS) have made significant progress in understanding the genetic basis of complex traits and diseases, the identified GWAS loci only explain part of the trait heritability, and the remaining part is known as the "missing heritability" problem.
2. **Understanding the functional impact of non - coding regions**: Most GWAS loci are located in non - coding regions, and the functional impact of these regions is still unclear, which makes it difficult to understand how they regulate gene expression and affect traits.
3. **Joint analysis of cis - and trans - eQTL**: Existing eQTL studies usually only focus on cis - eQTL or consider cis - and trans - eQTL separately, and fail to fully understand their joint effects.
To solve these problems, the authors propose a new method to discover candidate genes regulated by cis and trans by integrating GWAS signals and functional genomic data (such as eQTL). This method uses adaptive statistical metrics that can simultaneously reflect the strong and sparse effects of cis - eQTL and the weak and dense effects of trans - eQTL, thereby prioritizing the identification of candidate genes regulated by the joint effects.
### Method overview
1. **Model construction**:
- Suppose there are \(m\) GWAS signals associated with \(q\) disease - related traits, and consider that all \(K\) genes may be regulated by these GWAS signals.
- Obtain the summary statistics of \(Z_{jk}\) from the marginal association tests, where \(1\leq j\leq m\) and \(1\leq k\leq K\).
2. **Adaptive statistical metrics**:
- Use adaptive statistical metrics studied in high - dimensional inference, such as the Higher Criticism (HC) and Berk - Jones (BJ) tests, which perform well in detecting weak and rare signals.
- Define HC and BJ statistics and evaluate the significance of each gene by calculating these statistics.
3. **Steps**:
- Identify \(m\) GWAS signals for \(q\) traits related to a specific disease.
- Obtain the p - values of the marginal association tests between these GWAS signals and all \(K\) genes.
- Calculate the adaptive statistical metric \(T_k\) for each gene.
- Rank the genes according to the value of \(T_k\).
- Calculate the p - values of all \(T_k\).
- Select candidate eGenes with significant \(T_k\) p - values through multiple testing.
### Results and applications
1. **Simulation study**:
- The effectiveness and efficiency of this method were verified through simulation studies, especially in the case where different types of eGenes coexist.
- The results show that the HC and BJ methods perform well in various scenarios and are superior to other methods.
2. **Actual data analysis**:
- This method was applied to the adipose tissue eQTL data in the METSIM study, and 424 significant eGenes were discovered.
- These genes play an important role in encoding multifunctional proteins and regulating multiple cellular processes, providing new insights into the complex genetic regulatory mechanisms.
### Conclusion
This study proposes a new method to discover candidate genes regulated by GWAS signals by integrating cis - and trans - eQTL data. This method not only solves the problem of missing heritability but also provides a deeper understanding of the genetic regulatory mechanisms of complex traits and diseases. These findings provide valuable targets for functional studies and therapeutic interventions.