Clustering-based feature subset selection with analysis on the redundancy–complementarity dimension
Zhijun Chen,Qiushi Chen,Yishi Zhang,Lei Zhou,Junfeng Jiang,Chaozhong Wu,Zhen Huang
DOI: https://doi.org/10.1016/j.comcom.2021.01.005
IF: 5.047
2021-02-01
Computer Communications
Abstract:<p>In the era of big data, dimensionality reduction plays an extremely important role in many fields driven by machine learning and data mining techniques. The existing information-theoretic feature selection algorithms generally reduce the dimension by selecting the features with maximum class-relevance and minimum redundancy, while relatively overlook the complementary correlation among features and sometimes deal with it improperly. This paper proposes a novel feature subset selection algorithm called the Clustering-based Feature Selection with Redundancy-Complementarity Analysis (CFSRCA). The proposed algorithm can be mainly divided into two steps, namely, (a) selecting the candidate class-relevant features, and (b) selecting the <em>representative</em> features. In the latter step, the <em>representative</em> features are defined as the features with minimum redundancy and maximum complementarity, and a clustering method based on the minimum spanning tree (MST) is proposed to distinguish them effectively. To validate the effectiveness of CFSRCA, three comparative feature selection algorithms (ReliefF, CFS, and FOU) and four well-known classifiers (C4.5, SVM, kNN, and NBC) are used to conduct classification experiments on eight datasets. Experimental results verify the effectiveness of the proposed feature subset algorithm.</p>
computer science, information systems,telecommunications,engineering, electrical & electronic