Gene-centric gene-gene interaction: A model-based kernel machine method

Shaoyu Li,Yuehua Cui
DOI: https://doi.org/10.1214/12-AOAS545
2012-09-28
Abstract:Much of the natural variation for a complex trait can be explained by variation in DNA sequence levels. As part of sequence variation, gene-gene interaction has been ubiquitously observed in nature, where its role in shaping the development of an organism has been broadly recognized. The identification of interactions between genetic factors has been progressively pursued via statistical or machine learning approaches. A large body of currently adopted methods, either parametrically or nonparametrically, predominantly focus on pairwise single marker interaction analysis. As genes are the functional units in living organisms, analysis by focusing on a gene as a system could potentially yield more biologically meaningful results. In this work, we conceptually propose a gene-centric framework for genome-wide gene-gene interaction detection. We treat each gene as a testing unit and derive a model-based kernel machine method for two-dimensional genome-wide scanning of gene-gene interactions. In addition to the biological advantage, our method is statistically appealing because it reduces the number of hypotheses tested in a genome-wide scan. Extensive simulation studies are conducted to evaluate the performance of the method. The utility of the method is further demonstrated with applications to two real data sets. Our method provides a conceptual framework for the identification of gene-gene interactions which could shed novel light on the etiology of complex diseases.
Applications
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to detect gene - gene interactions at the gene level in order to better understand the genetic basis of complex traits, especially the occurrence mechanism of complex diseases**. Specifically, most of the existing gene - gene interaction studies focus on the interaction analysis between single SNP loci, while ignoring the overall role of genes as a functional unit. The author proposes a gene - centric framework, regarding each gene as a test unit, and conducts genome - wide gene - gene interaction detection through the modeled kernel machine method. This method is not only biologically meaningful, but also can reduce the number of hypothesis tests in genome - wide scans, thereby increasing statistical power. ### Main problems and solutions 1. **Limitations of existing methods**: - Most of the existing gene - gene interaction methods mainly focus on the interactions between single SNP loci. - These methods face challenges when dealing with complex association patterns (such as genetic heterogeneity, gene - gene interactions, and gene - environment interactions), resulting in low replication rates and difficult - to - interpret results. 2. **Proposed solutions**: - **Gene - centric framework**: Consider each gene as an overall system instead of considering single SNP loci separately. - **Kernel machine method**: Use the smoothing - spline ANOVA model to model gene - gene interactions. - **Reduce the number of hypothesis tests**: By focusing on genes as test units, the number of hypothesis tests in genome - wide scans is reduced, thereby alleviating the multiple - testing burden and increasing the test power. ### Advantages of the method - **Biological advantages**: The gene - centric method is more in line with biological reality because genes are the functional units of organisms, and variants have higher functional importance within genes. - **Statistical advantages**: By reducing the number of hypothesis tests, statistical power can be significantly increased, and complex non - linear effects between multiple variants can be captured. In conclusion, this paper aims to provide new insights into the genetic mechanisms of complex diseases and improve the deficiencies of existing methods through the gene - centric gene - gene interaction analysis framework.