Stable feature selection based on probability estimation in gene expression datasets

Melika Ahmadi,Hamid Mahmoodian
DOI: https://doi.org/10.1016/j.eswa.2024.123372
IF: 8.5
2024-02-06
Expert Systems with Applications
Abstract:Knowledge discovery from big datasets is one of the most important challenges in the pattern recognition field. More important than this is how much the extracted information and created models are reliable. Studies have shown that these models are usually highly dependent on the samples, features, data, and structure of the models. In general, the issue of stability is very important in creating models. This paper presents a method that not only considers the effect of different known classifiers but also tries to achieve a stable model for separating samples by combining different feature selection methods and considering the criterion of stability. Briefly, our contributions to the proposed method include 1) analyzing the ability of features in sample classification individually with different well-known classifiers, 2) estimating the probability of the features that could be selected in high-impact sets of features, and 3) applying the stability concept to increase the weight of the robust sets of the features. The proposed algorithm is used to select the high-impact genes of microarray datasets. Three high-dimensional gene expressions of cancerous tissues are used as benchmarks. The results obtained show relative superiority compared to other methods.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?