Genetic Clustering Algorithm-Based Feature Selection and Divergent Random Forest for Multiclass Cancer Classification Using Gene Expression Data

Senbagamalar, L.

DOI: https://doi.org/10.1007/s44196-024-00416-9

IF: 2.259

2024-02-06

International Journal of Computational Intelligence Systems

Abstract:Computational identification and classification of clinical disorders gather major importance due to the effective improvement of machine learning methodologies. Cancer identification and classification are essential clinical areas to address, where accurate classification for multiple types of cancer is still in a progressive stage. In this article, we propose a multiclass cancer classification model that categorizes the five different types of cancers using gene expression data. To perform efficient analysis of the available clinical data, we propose feature selection and classification methods. We propose a genetic clustering algorithm (GCA) for optimal feature selection from the RNA-gene expression data, consisting of 801 samples belonging to the five major classes of cancer. The proposed feature selection method reduces the 1621 gene expressions into a cluster of 21 features. The optimum feature set acts as input data to the proposed divergent random forest. Based on the features computed, the proposed classifier categorizes the data samples into 5 different classes of cancers, including breast cancer, colon cancer, kidney cancer, lung cancer, and prostate cancer. The proposed divergent random forest provided performance improvisation in terms of accuracy with 95.21%, specificity with 93%, and sensitivity with 94.29% which outperformed all the other existing multiclass classification algorithms.

computer science, artificial intelligence, interdisciplinary applications

What problem does this paper attempt to address?

Based on the provided text content, the problems that this paper attempts to solve can be summarized as follows: In the identification and classification of cancer, especially the accurate classification of multi - class cancers, it is still an ongoing research area. Although the existing binary classification methods (such as distinguishing cancer samples from normal samples) have played a role in effective diagnostic tools in the continuous monitoring stage, these methods have limitations when dealing with a large amount of gene expression data, because these data may contain multiple types of cancer samples, not just two categories. Therefore, it is particularly urgent to develop a multi - class classification model that can efficiently select features and effectively classify. Specifically, this paper proposes a feature selection method based on the Genetic Clustering Algorithm (GCA) and a new Divergent Random Forest (DF) classifier, aiming to solve the following problems: 1. **Dimensionality reduction of high - dimensional gene expression data**: Select the optimal feature set from a large amount of gene expression data through the genetic clustering algorithm to reduce the dimension and complexity of the data. 2. **Multi - class cancer classification**: Use the divergent random forest classifier to classify five different types of cancer (breast cancer, colon cancer, kidney cancer, lung cancer, and prostate cancer). 3. **Improve classification performance**: By optimizing feature selection and classification methods, improve the accuracy, specificity, and sensitivity of classification, thereby outperforming existing classification algorithms in multi - class cancer classification tasks. The main contribution of the paper is to propose a new genetic clustering algorithm and a divergent random forest classifier, which can achieve a high accuracy rate (95.21%), specificity (93%), and sensitivity (94.29%) in multi - class cancer classification tasks, significantly better than other existing multi - class classification algorithms.

Genetic Clustering Algorithm-Based Feature Selection and Divergent Random Forest for Multiclass Cancer Classification Using Gene Expression Data

Multiclass Cancer Classification by Using Fuzzy Support Vector Machine and Binary Decision Tree with Gene Selection

Deep-Learning-Based Cancer Profiles Classification Using Gene Expression Data Profile

Multiclass cancer diagnosis using tumor gene expression signatures

Cancer prediction with gene expression profiling and differential evolution

Gene selection and classification for cancer microarray data based on machine learning and similarity measures

Gene selection for cancer classification using a hybrid of univariate and multivariate feature selection methods

Classification of human cancer diseases by gene expression profiles

The efficacy of various machine learning models for multi-class classification of RNA-seq expression data

Multi-scale supervised clustering-based feature selection for tumor classification and identification of biomarkers and targets on genomic data

Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data

MGRFE: Multilayer Recursive Feature Elimination Based on an Embedded Genetic Algorithm for Cancer Classification

Gene Selection for Cancer Classification using Support Vector Machines

Gene Selection Based Cancer Classification With Adaptive Optimization Using Deep Learning Architecture

ALL/AML Cancer Classification by Gene Expression Data Using SVM and CSVM Approach

A novel and innovative cancer classification framework through a consecutive utilization of hybrid feature selection

Comparative Study of Cancer Classification by Analysis of RNA-seq Gene Expression Levels

Gene Expression-Based Cancer Classification for Handling the Class Imbalance Problem and Curse of Dimensionality

Functional and Embedding Feature Analysis for Pan-Cancer Classification

Leveraging a Joint of Phenotypic and Genetic Features on Cancer Patient Subgrouping

Integrative Analysis of RNA Expression Data Unveils Distinct Cancer Types through Machine Learning Techniques