A two-phase cuckoo search based approach for gene selection and deep learning classification of cancer disease using gene expression data with a novel fitness function
Amol Avinash Joshi,Rabia Musheer Aziz
DOI: https://doi.org/10.1007/s11042-024-18327-4
IF: 2.577
2024-02-06
Multimedia Tools and Applications
Abstract:The early detection of cancer is of paramount importance in the medical field, as it can lead to more precise and effective interventions for successful cancer treatments. Cancer datasets typically contain numerous gene expression levels as features but with a limited number of samples. Thus, feature selection is a crucial initial step to streamline prediction algorithms. These selected features, or genes, play a pivotal role in cancer identification, treatment selection, and variation analysis among different techniques. To address this challenge, present two novel methodologies, by combining Cuckoo Search (CS) and Spider Monkey Optimization (SMO), referred to as SMOCS (Cuckoo Search followed by Spider Monkey Optimization) and CSSMO (Spider Monkey Optimization followed by Cuckoo Search). These approaches are designed for harnessing the strengths of both metaheuristic algorithms to identify a subset of genes that aid in early-stage cancer prediction. Additionally, to enhance the accuracy of the both algorithms, we employ a gene expression reduction method known as minimum Redundancy Maximum Relevance (mRMR) to reduce redundancy in cancer datasets. Subsequently, these gene subsets are classified using Deep Learning (DL) to identify distinct groups or classes associated with specific cancer types. We evaluate the performance of our proposed approaches using six different cancer datasets, assessing cancer sample classification and prediction through metrics such as Recall, Precision, F1-Score, and confusion matrix analysis. Our gene selection methods, in conjunction with DL, achieves significantly improved prediction accuracy when applied to large gene expression datasets compared to existing Deep Learning (DL) and Machine learning models. Experimental results shows that both SMOCS and CSSMO tend to classify cancer with high prediction accuracy, but SMOCS algorithm gives higher prediction accuracy for all the six datasets used with a maximum accuracy of 100%.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering