Rethinking Embedded Unsupervised Feature Selection: A Simple Joint Approach

Heng Chang,Jun Guo,Wenwu Zhu
DOI: https://doi.org/10.1109/tbdata.2022.3178715
2022-01-01
IEEE Transactions on Big Data
Abstract:Recently, various embedded methods for unsupervised feature selection have been put forward. However, most of them adopt a two-step strategy, i.e., selecting $k$ top-ranked dimensions according to a learned order of all features, then conducting K-means clustering for evaluation. This commonly used strategy usually results in a group of sub-optimal features, because the selected $k$ top-ranked features are seldom the desired top- $k$ dimensions. To address this problem, we rethink the two steps in a joint manner and propose a simple yet effective approach called U nsupervised F eature S election with S eparability ( UFS$^{\mathbf{2}}$2 ) to simultaneously select features and cluster data. More specifically, a binary vector is seamlessly integrated into K-means to select an exact number of features for clustering. Different from previous embedded methods involving $l_{2,1}$ -norm, our joint model explicitly uses the parameter $k$ (i.e., the number of selected features). Afterwards, a customized term for the binary vector is designed to maximize the separability among selected feature dimensions. In order to solve the formulated 0-1 integer programming problem, an iterative algorithm is developed. Finally, we evaluate the proposed approach extensively on different datasets. Despite the relative simplicity, UFS $^{2}$ remarkably and generally outperforms state-of-the-art baselines.
What problem does this paper attempt to address?