Abstract:Background: The key roles of epistatic interactions between multiple genetic variants in the pathogenesis of complex diseases notwithstanding, the detection of such interactions remains a great challenge in genome-wide association studies. Although some existing multi-locus approaches have shown their successes in small-scale case-control data, the "combination explosion" course prohibits their applications to genome-wide analysis. It is therefore indispensable to develop new methods that are able to reduce the search space for epistatic interactions from an astronomic number of all possible combinations of genetic variants to a manageable set of candidates.Results: We studied case-control data from the viewpoint of binary classification. More precisely, we treated single nucleotide polymorphism (SNP) markers as categorical features and adopted the random forest to discriminate cases against controls. On the basis of the gini importance given by the random forest, we designed a sliding window sequential forward feature selection (SWSFS) algorithm to select a small set of candidate SNPs that could minimize the classification error and then statistically tested up to three-way interactions of the candidates. We compared this approach with three existing methods on three simulated disease models and showed that our approach is comparable to, sometimes more powerful than, the other methods. We applied our approach to a genome-wide case-control dataset for Age-related Macular Degeneration (AMD) and successfully identified two SNPs that were reported to be associated with this disease.Conclusion: Besides existing pure statistical approaches, we demonstrated the feasibility of incorporating machine learning methods into genome-wide case-control studies. The gini importance offers yet another measure for the associations between SNPs and complex diseases, thereby complementing existing statistical measures to facilitate the identification of epistatic interactions and the understanding of epistasis in the pathogenesis of complex diseases.

Entanglement Mapping: A Novel Method to Detect Interacting SNPs in Genome-Wide Studies

Detecting Essential and Removable Interactions in Genome-Wide Association Studies

Mixed Linear Model Approaches of Association Mapping for Complex Traits Based on Omics Variants

A Novel Approach to Encode Two-Way Epistatic Interactions Between Single Nucleotide Polymorphisms

Revealing third-order interactions through the integration of machine learning and entropy methods in genomic studies

Variable selection method for the identification of epistatic models.

Detecting Genetic Interactions with Visible Neural Networks

Nonparametric Disequilibrium Mapping of Functional Sites Using Haplotypes of Multiple Tightly Linked Single-Nucleotide Polymorphism Markers

Interpreting artificial neural networks to detect genome-wide association signals for complex traits

A Random Forest Approach to the Detection of Epistatic Interactions in Case-Control Studies

Searching Genome-Wide Multi-Locus Associations for Multiple Diseases Based on Bayesian Inference.

JS-MA: A Jensen-Shannon Divergence Based Method for Mapping Genome-wide Associations on Multiple Diseases

SCAMPI: A scalable statistical framework for genome-wide interaction testing harnessing cross-trait correlations

A fast algorithm for detecting gene-gene interactions in genome-wide association studies

A unified method for rare variant analysis of gene-environment interactions

New Approaches to Identify Gene-by-Gene Interactions in Genome Wide Association Studies

Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies

Identifying novel genetic and phenotypic associations to genomic features by leveraging off-target reads in exome sequencing data

Inferring Gene-Disease Association by an Integrative Analysis of eQTL Genome-Wide Association Study and Protein-Protein Interaction Data.

Non-subjective power analysis to detect G*E interactions in Genome-Wide Association Studies in presence of confounding factor