A Supervised Feature Selection Method For Mixed-Type Data using Density-based Feature Clustering

Xuyang Yan,Mrinmoy Sarkar,Biniam Gebru,Shabnam Nazmi,Abdollah Homaifar
DOI: https://doi.org/10.48550/arXiv.2111.08169
IF: 5.414
2021-11-10
Machine Learning
Abstract:Feature selection methods are widely used to address the high computational overheads and curse of dimensionality in classifying high-dimensional data. Most conventional feature selection methods focus on handling homogeneous features, while real-world datasets usually have a mixture of continuous and discrete features. Some recent mixed-type feature selection studies only select features with high relevance to class labels and ignore the redundancy among features. The determination of an appropriate feature subset is also a challenge. In this paper, a supervised feature selection method using density-based feature clustering (SFSDFC) is proposed to obtain an appropriate final feature subset for mixed-type data. SFSDFC decomposes the feature space into a set of disjoint feature clusters using a novel density-based clustering method. Then, an effective feature selection strategy is employed to obtain a subset of important features with minimal redundancy from those feature clusters. Extensive experiments as well as comparison studies with five state-of-the-art methods are conducted on SFSDFC using thirteen real-world benchmark datasets and results justify the efficacy of the SFSDFC method.
What problem does this paper attempt to address?