Personalized graph feature-based multi-omics data integration for cancer subtype identification

Saiful Islam,Md. Nahid Hasan
2024-08-17
Abstract:Cancer is a highly heterogeneous disease with significant variability in molecular features and clinical outcomes, making diagnosis and treatment challenging. In recent years, high-throughput omic technologies have facilitated the discovery of mechanisms underlying various cancer subtypes by providing diverse omics data, such as gene expression, DNA methylation, and miRNA expression. However, the complexity and heterogeneity of multi-omics data present significant challenges for their integration in exploring cancer subtypes. Various methods have been proposed to address these challenges. In this paper, we propose a novel and straightforward approach for identifying cancer subtypes by integrating patient-specific subnetworks features from different omics data. We construct patient-specific induced subnetwork using a random walk with restart algorithm from patient similarity networks (PSNs) and compute nine structural properties that capture essential network topology. These features are integrated across the three omic datasets to form comprehensive patient profiles. K-means clustering is then applied for cancer subtype identification. We evaluate our approach on five cancer datasets, including breast invasive carcinoma, colon adenocarcinoma, glioblastoma multiforme, kidney renal clear cell carcinoma, and lung squamous cell carcinoma, for three different omic data types. The evaluation shows that our method produces promising and effective results, demonstrating competitive or superior performance compared to existing methods and underscoring its potential for advancing personalized cancer diagnosis and treatment.
Quantitative Methods,Social and Information Networks,Genomics
What problem does this paper attempt to address?
This paper aims to solve the problem of cancer subtype identification. Specifically, cancer is a highly heterogeneous disease. Even within the same type of cancer, there are significant differences in molecular characteristics and clinical outcomes, which makes effective diagnosis and treatment very challenging. In recent years, the development of high - throughput omics technologies has promoted the discovery of multiple cancer subtype mechanisms and provided various omics data such as gene expression, DNA methylation and miRNA expression. However, the complexity and heterogeneity of multi - omics data pose significant challenges to the integration of these data for exploring cancer subtypes. To solve these problems, this paper proposes a novel and straightforward method to identify cancer subtypes by integrating patient - specific sub - network features from different omics data. The specific steps are as follows: 1. **Constructing Patient Similarity Network (PSN)**: For each type of omics data, use the cosine similarity measure to construct a Patient Similarity Network (PSN), emphasizing the association of patients with similar molecular profiles based on each type of omics. 2. **Sub - network construction**: From the PSN of each type of omics, use the random walk with restart algorithm to generate an induced sub - network for each patient, exploring the neighboring and remote nodes in the PSN. 3. **Sub - network feature extraction**: Calculate nine structural properties from each sub - network, which capture important aspects of network topology, including average node degree, average node strength, coefficient of variation of node strength, weighted density, trace, the largest and second - largest eigenvalues of the Laplacian matrix, average clustering coefficient, average weighted betweenness centrality and average weighted closeness centrality. 4. **Network feature fusion**: Average - aggregate the feature vectors from the three omics data to form a comprehensive patient feature vector. 5. **Sample clustering**: Apply the K - means clustering algorithm to the aggregated feature vectors, and use the silhouette score to determine the optimal number of clusters, thereby identifying cancer subtypes. The paper demonstrates the effectiveness and superiority of this method in cancer subtype identification by evaluating the performance of this method on five cancer datasets and comparing it with four existing methods. These results indicate that this method can effectively overcome the complexity and heterogeneity of multi - omics data and provides a potential tool for personalized cancer diagnosis and treatment.