Graph-Based Semi-supervised Feature Selection with Application to Automatic Spam Image Identification

Hongrong Cheng,Wei Deng,Chong Fu,Yong Wang,Zhiguang Qin
DOI: https://doi.org/10.1007/978-3-642-22691-5_45
2011-01-01
Abstract:In this paper, we propose a new spectral semi-supervised feature selection criterion called s-Laplacian score. It identifies discriminate features by measuring their capability of preserving both local and global geometrical structure. To address the limitation for spectral feature selection which cannot handle redundant features, we define Classification Information Gain degree (CIG) to measure redundant features. Based on s-Laplacian and CIG, we propose a graph-based semi-supervised feature selection algorithm (GSFS). The experimental results on real-world image dataset for automatic spam image identification problem show that GSFS can do well in utilizing small labeled samples and a large amount unlabeled data to select discriminate features.
What problem does this paper attempt to address?