BoostFS: A Boosting-Based Irrelevant Feature Selection Algorithm

Qi-Guang Miao,Ying Cao,Jian-Feng Song,Jiachen Liu,Yining Quan
DOI: https://doi.org/10.1142/s0218001415510118
IF: 1.261
2015-01-01
International Journal of Pattern Recognition and Artificial Intelligence
Abstract:In a learning process, features play a fundamental role. In this paper, we propose a Boosting-based feature selection algorithm called BoostFS. It extends AdaBoost which is designed for classification problems to feature selection. BoostFS maintains a distribution over training samples which is initialized from the uniform distribution. In each iteration, a decision stump is trained under the sample distribution and then the sample distribution is adjusted so that it is orthogonal to the classification results of all the generated stumps. Because a decision stump can also be regarded as one selected feature, BoostFS is capable to select a subset of features that are irrelevant to each other as much as possible. Experimental results on synthetic datasets, five UCI datasets and a real malware detection dataset all show that the features selected by BoostFS help to improve learning algorithms in classification problems, especially when the original feature set contains redundant features.
What problem does this paper attempt to address?