A fast algorithm for detecting gene-gene interactions in genome-wide association studies

Jiahan Li,Wei Zhong,Runze Li,Rongling Wu
DOI: https://doi.org/10.1214/14-AOAS771
2015-02-03
Abstract:With the recent advent of high-throughput genotyping techniques, genetic data for genome-wide association studies (GWAS) have become increasingly available, which entails the development of efficient and effective statistical approaches. Although many such approaches have been developed and used to identify single-nucleotide polymorphisms (SNPs) that are associated with complex traits or diseases, few are able to detect gene-gene interactions among different SNPs. Genetic interactions, also known as epistasis, have been recognized to play a pivotal role in contributing to the genetic variation of phenotypic traits. However, because of an extremely large number of SNP-SNP combinations in GWAS, the model dimensionality can quickly become so overwhelming that no prevailing variable selection methods are capable of handling this problem. In this paper, we present a statistical framework for characterizing main genetic effects and epistatic interactions in a GWAS study. Specifically, we first propose a two-stage sure independence screening (TS-SIS) procedure and generate a pool of candidate SNPs and interactions, which serve as predictors to explain and predict the phenotypes of a complex trait. We also propose a rates adjusted thresholding estimation (RATE) approach to determine the size of the reduced model selected by an independence screening. Regularization regression methods, such as LASSO or SCAD, are then applied to further identify important genetic effects. Simulation studies show that the TS-SIS procedure is computationally efficient and has an outstanding finite sample performance in selecting potential SNPs as well as gene-gene interactions. We apply the proposed framework to analyze an ultrahigh-dimensional GWAS data set from the Framingham Heart Study, and select 23 active SNPs and 24 active epistatic interactions for the body mass index variation. It shows the capability of our procedure to resolve the complexity of genetic control.
Applications
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to detect gene - gene interactions (i.e., genetic interactions in phenotypic variation) in genome - wide association studies (GWAS). Although many statistical methods have been developed to identify single - nucleotide polymorphisms (SNPs) associated with complex traits or diseases, there are few methods that can effectively detect gene - gene interactions between different SNPs. Due to the large number of SNP - SNP combinations in GWAS datasets, the model dimension expands rapidly, and existing variable selection methods have difficulty dealing with this problem. To meet this challenge, the authors propose a statistical framework to characterize the main genetic effects and gene - gene interactions in phenotypic variation. Specifically, they first propose a two - stage sure independence screening (TS - SIS) procedure to generate a pool of candidate SNPs and their interactions, which are used as predictors to explain and predict the phenotypes of complex traits. In addition, they also propose an adjusted threshold estimation (RATE) method to determine the size of the reduced model selected by independent screening. Finally, regularized regression methods (such as LASSO or SCAD) are applied to further identify important genetic effects. This framework aims to improve the ability to detect gene - gene interactions in GWAS data, thereby better understanding the contribution of complex genetic structures to phenotypic variation. Through this method, researchers can more effectively identify genetic variations associated with specific traits or diseases, providing strong support for genetics and biomedical research.