multi-GPA-Tree: Statistical Approach for Pleiotropy Informed and Functional Annotation Tree Guided Prioritization of GWAS Results

Aastha Khatiwada,Ayse Selen Yilmaz,Bethany J. Wolf,Maciej Pietrzak,Dongjun Chung
DOI: https://doi.org/10.48550/arXiv.2302.01982
2023-02-04
Abstract:Genome-wide association studies (GWAS) have successfully identified over two hundred thousand genotype-trait associations. Yet some challenges remain. First, complex traits are often associated with many single nucleotide polymorphisms (SNPs), most with small or moderate effect sizes, making them difficult to detect. Second, many complex traits share a common genetic basis due to `pleiotropy' and and though few methods consider it, leveraging pleiotropy can improve statistical power to detect genotype-trait associations with weaker effect sizes. Third, currently available statistical methods are limited in explaining the functional mechanisms through which genetic variants are associated with specific or multiple traits. We propose multi-GPA-Tree to address these challenges. The multi-GPA-Tree approach can identify risk SNPs associated with single as well as multiple traits while also identifying the combinations of functional annotations that can explain the mechanisms through which risk-associated SNPs are linked with the traits. First, we implemented simulation studies to evaluate the proposed multi-GPA-Tree method and compared its performance with an existing statistical <a class="link-external link-http" href="http://approach.The" rel="external noopener nofollow">this http URL</a> results indicate that multi-GPA-Tree outperforms the existing statistical approach in detecting risk-associated SNPs for multiple traits. Second, we applied multi-GPA-Tree to a systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA), and to a Crohn's disease (CD) and ulcertive colitis (UC) GWAS, and functional annotation data including GenoSkyline and GenoSkylinePlus. Our results demonstrate that multi-GPA-Tree can be a powerful tool that improves association mapping while facilitating understanding of the underlying genetic architecture of complex traits and potential mechanisms linking risk-associated SNPs with complex traits.
Methodology,Applications,Computation
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on the following aspects: 1. **Challenges in complex - trait association detection**: Genome - wide association studies (GWAS) have successfully identified more than 200,000 genotype - phenotype associations, but still face some challenges. First, many complex traits are associated with multiple single - nucleotide polymorphisms (SNPs), and most of these SNPs have small or medium effect sizes, making them difficult to detect. Second, due to "pleiotropy" (i.e., multiple traits share the same genetic basis), many complex traits have a common genetic basis. Although a few methods have taken this into account, using pleiotropy can improve the statistical power of detecting genotype - phenotype associations with weaker effect sizes. Third, the currently available statistical methods are limited in explaining how genetic variations specifically or simultaneously affect the functional mechanisms of specific traits. 2. **Proposing the multi - GPA - Tree method**: To address the above challenges, the authors proposed the multi - GPA - Tree method. This method aims to identify risk SNPs associated with single or multiple traits and at the same time identify combinations of functional annotations that can explain how risk - related SNPs are associated with traits. 3. **Improving statistical power and functional interpretation**: Through simulation studies and real - data applications, the multi - GPA - Tree method outperforms existing statistical methods in detecting pleiotropy - related risk SNPs and can more effectively identify biologically important combinations of functional annotations, thereby improving the statistical power of association mapping and at the same time promoting the understanding of the underlying genetic structure and its mechanisms of complex traits. Specifically, the multi - GPA - Tree method solves these problems in the following ways: - **Integrating pleiotropic relationships**: Utilize the association summary statistics of multiple GWAS and improve statistical power by simultaneously integrating data of multiple traits. - **Functional annotation integration**: Combine functional annotation data, such as genomic functional annotation information, to explain the functional mechanisms between genetic variations and specific or multiple traits. - **Variable selection**: Select relevant functional annotations or combinations of functional annotations from a large number of functional annotations, thereby more accurately identifying genetic variations associated with one or more traits. Through these methods, multi - GPA - Tree not only improves the ability to detect risk - related SNPs but also provides in - depth understanding of the genetic structure and mechanisms of complex traits.