A Bioinformatic Approach Validated Utilizing Machine Learning Algorithms to Identify Relevant Biomarkers and Crucial Pathways in Gallbladder Cancer

Rabea Khatun,Wahia Tasnim,Maksuda Akter,Md Manowarul Islam,Md. Ashraf Uddin,Md. Zulfiker Mahmud,Saurav Chandra Das
2024-10-18
Abstract:Gallbladder cancer (GBC) is the most frequent cause of disease among biliary tract neoplasms. Identifying the molecular mechanisms and biomarkers linked to GBC progression has been a significant challenge in scientific research. Few recent studies have explored the roles of biomarkers in GBC. Our study aimed to identify biomarkers in GBC using machine learning (ML) and bioinformatics techniques. We compared GBC tumor samples with normal samples to identify differentially expressed genes (DEGs) from two microarray datasets (GSE100363, GSE139682) obtained from the NCBI GEO database. A total of 146 DEGs were found, with 39 up-regulated and 107 down-regulated genes. Functional enrichment analysis of these DEGs was performed using Gene Ontology (GO) terms and REACTOME pathways through DAVID. The protein-protein interaction network was constructed using the STRING database. To identify hub genes, we applied three ranking algorithms: Degree, MNC, and Closeness Centrality. The intersection of hub genes from these algorithms yielded 11 hub genes. Simultaneously, two feature selection methods (Pearson correlation and recursive feature elimination) were used to identify significant gene subsets. We then developed ML models using SVM and RF on the GSE100363 dataset, with validation on GSE139682, to determine the gene subset that best distinguishes GBC samples. The hub genes outperformed the other gene subsets. Finally, NTRK2, COL14A1, SCN4B, ATP1A2, SLC17A7, SLIT3, COL7A1, CLDN4, CLEC3B, ADCYAP1R1, and MFAP4 were identified as crucial genes, with SLIT3, COL7A1, and CLDN4 being strongly linked to GBC development and prediction.
Genomics,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the problem of identifying biomarkers and key pathways associated with the progression of gallbladder cancer (GBC). Specifically, the study aims to identify differentially expressed genes (DEGs) from two microarray datasets (GSE100363 and GSE139682) using machine learning and bioinformatics techniques. Further analysis of these genes includes functional enrichment, pathway enrichment, and protein-protein interaction networks to ultimately determine key hub genes and important gene subsets. Through these methods, the research hopes to find biomarkers that can effectively distinguish gallbladder cancer samples from normal samples and validate the potential value of these biomarkers in diagnosis and prognosis. ### Main research questions include: 1. **How can bioinformatics and machine learning methods be used to identify biomarkers for gallbladder cancer (GBC)? Which method can produce the optimal biomarkers?** 2. **Which differentially expressed genes (DEGs) are significant in distinguishing gallbladder cancer from healthy samples?** 3. **How can the effectiveness of these DEGs in accurately classifying gallbladder cancer samples be validated through machine learning models?** 4. **What is the potential diagnostic and prognostic value of the identified hub genes and DEGs in gallbladder cancer?** These questions are posed to improve the accuracy of gallbladder cancer diagnosis and prognosis, thereby providing support for effective treatment and management. Additionally, these questions aim to bridge the gap between traditional clinical methods and modern computational techniques, leveraging the power of high-throughput gene expression data and advanced algorithms to achieve this goal.