An improved conditional relevance and weighted redundancy feature selection method for gene expression data

Xiwen Qin,Siqi Zhang,Xiaogang Dong,Tingru Luo,Hongyu Shi,Liping Yuan
DOI: https://doi.org/10.1007/s11227-024-06714-5
IF: 3.3
2024-12-06
The Journal of Supercomputing
Abstract:Selection of relevant features from gene expression data is important for disease diagnosis, drug development and clinical decision making. Most of the existing information theory-based feature selection methods do not consider the effect of selected features on feature relevance and ignore the different effects of conditional redundancy and interaction information on feature redundancy. This paper proposes a feature selection method based on maximum conditional relevance and minimum weighted redundancy (MCRMWR). The MCRMWR method uses conditional mutual information to measure the relevance between candidate feature and class and introduces feature weighted redundancy terms to accurately measure the redundancy between features. By introducing a weighting factor to distinguish the contribution of interaction information and conditional redundant information, the redundancy between features can be assessed more accurately. In order to validate the performance of MCRMWR, it was compared with a variety of feature selection methods. Finally, we evaluated the proposed method on 12 gene expression datasets, and the experimental results showed that the proposed method was significantly better than other feature selection methods in terms of classification accuracy and F1 score. The results of the statistical tests also show that the proposed method is significantly different from other feature selection methods.
computer science, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?