An Ensemble Multi-Label Feature Selection Algorithm Based on Information Entropy

Shining Li,Zhenhai Zhang,Jiaqi Duan
2014-01-01
Abstract:In multilabel classification, feature selection is able to remove redundant and irrelevant features, which makes the classifiers faster and improves the prediction performance of the classifiers. Currently, most of feature selection algorithms in multilabel classification are dependent on the con crete classifier, which leads to high computation complexity. Hence this paper proposes an Ensemble Multilabel Feature Sele ction algorithm based on Information Entropy (EMFSIE), which is independent on any concrete classifiers. Its core idea consists of: Employing the information gain to evaluate the correlation between the feature and the label set, and filtering out useful features more effectively. We calculate the information gain in an ensemble framework and filter out useful features according to the threshold value determined by the effective factor. We validate EMFSIE on four datasets from two domains using four different multilabel classifiers. The ex perimental results and their analysis show preliminarily that EMFSIE can not only remove more than 70% of original features, which makes the classifiers faster, but also keep the prediction performance of the classifiers as good as before, even enhance the prediction performance on three datasets underthe twotailed pttests at 0.05 significance level.
What problem does this paper attempt to address?