Exploring Gene-Gene Interaction in Family-Based Data with an Unsupervised Machine Learning Method: EPISFA.

Xiao Xiang,Siyue Wang,Tianyi Liu,Mengying Wang,Jiawen Li,Jin Jiang,Tao Wu,Yonghua Hu
DOI: https://doi.org/10.1002/gepi.22342
2020-01-01
Genetic Epidemiology
Abstract:Gene-gene interaction (G x G) is thought to fill the gap between the estimated heritability of complex diseases and the limited genetic proportion explained by identified single-nucleotide polymorphisms. The current tools for exploring G x G were often developed for case-control designs with less considerations for their applications in families. Family-based studies are robust against bias led from population stratification in genetic studies and helpful in understanding G x G. We proposed a new algorithm epistasis sparse factor analysis (EPISFA) and epistasis sparse factor analysis for linkage disequilibrium (EPISFA-LD) based on unsupervised machine learning to screen G x G. Extensive simulations were performed to compare EPISFA/EPISFA-LD with a classical family-based algorithm FAM-MDR (family-based multifactor dimensionality reduction). The results showed that EPISFA/EPISFA-LD is a tool of both high power and computational efficiency that could be applied in family designs and is applicable within high-dimensionality datasets. Finally, we applied EPISFA/EPISFA-LD to a real dataset drawn from the Fangshan/family-based Ischemic Stroke Study in China. Five pairs of G x G were discovered by EPISFA/EPISFA-LD, including three pairs verified by other algorithms (FAM-MDR and logistic), and an additional two pairs uniquely identified by EPISFA/EPISFA-LD only. The results from EPISFA might offer new insights for understanding the genetic etiology of complex diseases. EPISFA/EPISFA-LD was implemented in R. All relevant source code as well as simulated data could be freely downloaded from .
What problem does this paper attempt to address?