Unsupervised Feature Selection with the Largest Angle Coding

Tianyi Huang,William Zhu
DOI: https://doi.org/10.1080/23799927.2017.1330283
2017-01-01
International Journal of Computer Mathematics Computer Systems Theory
Abstract:ABSTRACT In many areas such as machine learning, data mining and computer vision, feature selection is a crucial and challenging task to find a relevant feature subset of the original features. Unsupervised feature selection is a type of feature selection which preforms the task without label information. Many unsupervised feature selection methods select the top rank features without the analysis of the differences among features, so they cannot select a feature subset with strong generality. With the analysis of the differences among features in unsupervised feature selection, original dataset can be described more comprehensively by selected features. In this paper, we propose the difference degree matrix and a new method called unsupervised feature selection with the largest angle coding (FSAC). The difference degree matrix is used to describe the difference degree of the distributions of the data points on every two features and FSAC is an effective feature selection method. Different from existing unsupervised feature selection methods, FSAC selects features through the analysis of the differences among features and the self-representation of the difference degree matrix. To make the self-representation of the difference degree matrix more useful and reduce the redundant and noisy features, -norm constraint is added into the objective function of FSAC to guarantee the feature selection matrix sparse in the rows. Experimental results on different real-world datasets show that the promising performance of FSAC outperforms the state-of-the-arts. We also analyse the sensitivity of the parameter in the objective function.
What problem does this paper attempt to address?