Revealing third-order interactions through the integration of machine learning and entropy methods in genomic studies

Burcu Yaldız,Onur Erdoğan,Sevda Rafatov,Cem Iyigün,Yeşim Aydın Son
DOI: https://doi.org/10.1186/s13040-024-00355-3
2024-02-01
BioData Mining
Abstract:Non-linear relationships at the genotype level are essential in understanding the genetic interactions of complex disease traits. Genome-wide association Studies (GWAS) have revealed statistical association of the SNPs in many complex diseases. As GWAS results could not thoroughly reveal the genetic background of these disorders, Genome-Wide Interaction Studies have started to gain importance. In recent years, various statistical approaches, such as entropy-based methods, have been suggested for revealing these non-additive interactions between variants. This study presents a novel prioritization workflow integrating two-step Random Forest (RF) modeling and entropy analysis after PLINK filtering. PLINK-RF-RF workflow is followed by an entropy-based 3-way interaction information (3WII) method to capture the hidden patterns resulting from non-linear relationships between genotypes in Late-Onset Alzheimer Disease to discover early and differential diagnosis markers.
mathematical & computational biology
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to reveal third-order gene interactions associated with Late-Onset Alzheimer Disease (LOAD) and discover early diagnostic markers by integrating machine learning methods (Random Forest) and entropy analysis. Specifically: 1. **Background**: Genome-Wide Association Studies (GWAS) have revealed statistical associations of single nucleotide polymorphisms (SNPs) in many complex diseases, but these studies have failed to fully uncover the genetic background of these diseases. Therefore, Genome-Wide Interaction Studies (GWIS) are becoming important. 2. **Methods**: This paper proposes a new prioritization workflow that combines two-step Random Forest (RF) modeling and entropy analysis after PLINK filtering. The specific steps include: - Preliminary filtering using PLINK. - Further screening of SNPs through RF-RF modeling. - Capturing hidden patterns caused by nonlinear relationships using the entropy-based third-order interaction information (3WII) method. 3. **Objective**: To discover third-order interaction SNP combinations related to LOAD and serve as potential early diagnostic markers. Through this approach, the authors hope to reveal the significant contribution of complex genetic interactions to LOAD risk and provide a promising method for disease association studies.