Abstract:Background and Objective The limited number of samples and high-dimensional features in microarray data make selecting a small number of features for disease diagnosis a challenging problem. Traditional feature selection methods based on evolutionary algorithms are difficult to search for the optimal set of features in a limited time when dealing with the high-dimensional feature selection problem. New solutions are proposed to solve the above problems. Methods In this paper, we propose a hybrid feature selection method (C-IFBPFE) for biomarker identification in microarray data, which combines clustering and improved binary particle swarm optimization while incorporating an embedded feature elimination strategy. Firstly, an adaptive redundant feature judgment method based on correlation clustering is proposed for feature screening to reduce the search space in the subsequent stage. Secondly, we propose an improved flipping probability-based binary particle swarm optimization (IFBPSO), better applicable to the binary particle swarm optimization problem. Finally, we also design a new feature elimination (FE) strategy embedded in the binary particle swarm optimization algorithm. This strategy gradually removes poorer features during iterations to reduce the number of features and improve accuracy. Results We compared C-IFBPFE with other published hybrid feature selection methods on eight public datasets and analyzed the impact of each improvement. The proposed method outperforms other current state-of-the-art feature selection methods in terms of accuracy, number of features, sensitivity, and specificity. The ablation study of this method validates the efficacy of each component, especially the proposed feature elimination strategy significantly improves the performance of the algorithm. Conclusions The hybrid feature selection method proposed in this paper helps address the issue of high-dimensional microarray data with few samples. It can select a small subset of features and achieve high classification accuracy on microarray datasets. Additionally, independent validation of the selected features shows that those chosen by C-IFBPFE have strong correlations with disease phenotypes and can identify important biomarkers from data related to biomedical problems.

A combinational feature selection and ensemble neural network method for classification of gene expression data

Gene selection and classification for cancer microarray data based on machine learning and similarity measures

Gene Features Selection for Three-Class Disease Classification via Multiple Orthogonal Partial Least Square Discriminant Analysis and S-Plot Using Microarray Data

Deep-Learning-Based Cancer Profiles Classification Using Gene Expression Data Profile

A Hybrid Ensemble Algorithm Combining AdaBoost and Genetic Algorithm for Cancer Classification with Gene Expression Data.

Gene selection for cancer classification using a hybrid of univariate and multivariate feature selection methods

Multiclass Cancer Classification by Using Fuzzy Support Vector Machine and Binary Decision Tree with Gene Selection

Feature Selection and Classification of MAQC-II Breast Cancer and Multiple Myeloma Microarray Gene Expression Data

Feature (gene) Selection in Gene Expression-Based Tumor Classification

Feature selection of microarray data using multidimensional graph neural network and supernode hierarchical clustering

Parameters Selection in Gene Selection Using Gaussian Kernel Support Vector Machines by Genetic Algorithm

An ensemble learning-based feature selection algorithm for identification of biomarkers of renal cell carcinoma

Multilevel Feature Selection Method for Improving Classification of Microarray Gene Expression Data

Optimized feature selection method using particle swarm intelligence with ensemble learning for cancer classification based on microarray datasets

Ensemble Classification Model With CFS-IGWO-Based Feature Selection for Cancer Detection Using Microarray Data

Multi-scale supervised clustering-based feature selection for tumor classification and identification of biomarkers and targets on genomic data

Determination of biomarkers from microarray data using graph neural network and spectral clustering

Hybrid gene selection approach using XGBoost and multi-objective genetic algorithm for cancer classification

Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data

An improved binary particle swarm optimization algorithm for clinical cancer biomarker identification in microarray data

Improved multi-layer binary firefly algorithm for optimizing feature selection and classification of microarray data