Feature Selection Considering Two Types of Feature Relevancy and Feature Interdependency.
Liang Hu,Wanfu Gao,Kuo Zhao,Ping Zhang,Feng Wang
DOI: https://doi.org/10.1016/j.eswa.2017.10.016
IF: 8.5
2017-01-01
Expert Systems with Applications
Abstract:Feature selection based on information theory, which is used to select a group of the most informative features, has extensive application fields such as machine learning, data mining and natural language processing. However, numerous previous methods suffer from two common defects. (1) Feature relevancy is defined without distinguishing candidate feature relevancy and selected feature relevancy. (2) Some interdependent features may be misinterpreted as redundant features. In this study, we propose a feature selection method named Dynamic Relevance and Joint Mutual Information Maximization (DRJMIM) to address these two defects. DRJMIM includes four stages. First, the relevancy is divided into two categories: candidate feature relevancy and selected feature relevancy. Second, according to candidate feature relevancy that is joint mutual information, some redundant features are selected. Third, the redundant features are combined with a dynamic weight to reduce the selection possibility of true redundant features while increasing the false ones. Finally, the most informative and interdependent features are selected and true redundant features are eliminated simultaneously. Furthermore, our method is compared with five competitive feature selection methods on 12 publicly available data sets. The classification results show that DRJMIM performs better than other five methods. Its statistical significance is verified by a paired two-tailed t-test. Meanwhile, DRJMIM obtains few number of selected features when it achieves the highest classification accuracy. (C) 2017 Elsevier Ltd. All rights reserved.