LASSO-MOGAT: A Multi-Omics Graph Attention Framework for Cancer Classification

Fadi Alharbi,Aleksandar Vakanski,Murtada K. Elbashir,Mohanad Mohammed
DOI: https://doi.org/10.20935/AcadBiol7325
2024-08-31
Abstract:The application of machine learning methods to analyze changes in gene expression patterns has recently emerged as a powerful approach in cancer research, enhancing our understanding of the molecular mechanisms underpinning cancer development and progression. Combining gene expression data with other types of omics data has been reported by numerous works to improve cancer classification outcomes. Despite these advances, effectively integrating high-dimensional multi-omics data and capturing the complex relationships across different biological layers remains challenging. This paper introduces LASSO-MOGAT (LASSO-Multi-Omics Gated ATtention), a novel graph-based deep learning framework that integrates messenger RNA, microRNA, and DNA methylation data to classify 31 cancer types. Utilizing differential expression analysis with LIMMA and LASSO regression for feature selection, and leveraging Graph Attention Networks (GATs) to incorporate protein-protein interaction (PPI) networks, LASSO-MOGAT effectively captures intricate relationships within multi-omics data. Experimental validation using five-fold cross-validation demonstrates the method's precision, reliability, and capacity for providing comprehensive insights into cancer molecular mechanisms. The computation of attention coefficients for the edges in the graph by the proposed graph-attention architecture based on protein-protein interactions proved beneficial for identifying synergies in multi-omics data for cancer classification.
Machine Learning
What problem does this paper attempt to address?
The paper aims to address the following issues: 1. **Integration of Multi-Omics Data**: Effectively integrating high-dimensional multi-omics data and capturing the complex relationships between different biological layers remains a challenge. This paper proposes a new method called LASSO-MOGAT (LASSO Multi-Omics Gated Attention Framework) for integrating messenger RNA (mRNA), microRNA (miRNA), and DNA methylation data to classify 31 types of cancer. 2. **Feature Selection and Dimensionality Reduction**: By combining differential expression analysis (DEG) with linear models (LIMMA) and LASSO regression, the most informative multi-omics features are selected to improve classification performance. 3. **Utilizing Protein-Protein Interaction Networks**: Graph Attention Networks (GATs) are used to incorporate protein-protein interaction (PPI) networks into the model to capture the complex relationships in multi-omics data. Through five-fold cross-validation experiments, the effectiveness of the LASSO-MOGAT method in accurately and reliably classifying cancers is validated, providing comprehensive insights into the molecular mechanisms of cancer. This method has significant advantages in feature selection and model interpretation, enabling the identification of the most relevant features from multi-omics data, thereby significantly enhancing the performance of the classification model.