GPA-Tree: Statistical Approach for Functional-Annotation-Tree-Guided Prioritization of GWAS Results

Aastha Khatiwada,Bethany J. Wolf,Ayse Selen Yilmaz,Paula S. Ramos,Maciej Pietrzak,Andrew Lawson,Kelly J. Hunt,Hang J. Kim,Dongjun Chung
DOI: https://doi.org/10.48550/arXiv.2106.06877
2021-06-13
Abstract:Motivation: In spite of great success of genome-wide association studies (GWAS), multiple challenges still remain. First, complex traits are often associated with many single nucleotide polymorphisms (SNPs), each with small or moderate effect sizes. Second, our understanding of the functional mechanisms through which genetic variants are associated with complex traits is still limited. To address these challenges, we propose GPA-Tree and it simultaneously implements association mapping and identifies key combinations of functional annotations related to risk-associated SNPs by combining a decision tree algorithm with a hierarchical modeling framework. Results: First, we implemented simulation studies to evaluate the proposed GPA-Tree method and compared its performance with existing statistical approaches. The results indicate that GPA-Tree outperforms existing statistical approaches in detecting risk-associated SNPs and identifying the true combinations of functional annotations with high accuracy. Second, we applied GPA-Tree to a systemic lupus erythematosus (SLE) GWAS and functional annotation data including GenoSkyline and GenoSkylinePlus. The results from GPA-Tree highlight the dysregulation of blood immune cells, including but not limited to primary B, memory helper T, regulatory T, neutrophils and CD8+ memory T cells in SLE. These results demonstrate that GPA-Tree can be a powerful tool that improves association mapping while facilitating understanding of the underlying genetic architecture of complex traits and potential mechanisms linking risk-associated SNPs with complex traits.
Genomics,Methodology
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are: 1. **Complex traits are associated with multiple single - nucleotide polymorphisms (SNPs)**: Many complex traits (such as systemic lupus erythematosus, SLE, etc.) are associated with multiple SNPs, and the effect of each SNP is small or moderate. These SNPs often cannot reach the genome - wide significance threshold (usually \(5\times10^{-8}\)), so many SNPs related to traits remain unidentified. 2. **Limited understanding of functional mechanisms**: Although a large number of genetic variations related to complex traits have been identified, the understanding of their functional mechanisms is still limited. In particular, more than 85% of genetic variations are located in non - coding regions, and the functional roles of these regions are difficult to understand. To address these challenges, the paper proposes the GPA - Tree method. GPA - Tree combines the decision - tree algorithm and the hierarchical modeling framework, and simultaneously realizes association mapping and identification of functional annotation combinations related to risk - related SNPs. Specifically, GPA - Tree aims to: - **Improve statistical power**: By integrating GWAS data and functional annotation data, improve the statistical power of detecting SNPs related to traits. - **Identify key functional annotation combinations**: Identify functional annotation combinations related to risk - related SNPs, so as to better understand how these SNPs affect complex traits. Through simulation studies and application to real - data (such as systemic lupus erythematosus GWAS data), the paper demonstrates the superior performance of GPA - Tree in detecting risk - related SNPs and identifying functional annotation combinations. These results indicate that GPA - Tree can be used as a powerful tool, which not only improves the accuracy of association mapping, but also promotes the understanding of the genetic structure of complex traits and their underlying mechanisms.