Boosted Near-miss Under-sampling on SVM ensembles for concept detection in large-scale imbalanced datasets

Lei Bao,Cao Juan,Jintao Li,Yongdong Zhang
DOI: https://doi.org/10.1016/j.neucom.2014.05.096
IF: 6
2016-01-01
Neurocomputing
Abstract:Considering the challenges of using SVM to learn concepts from large-scale imbalanced datasets, we proposed a new method: Boosted Near-miss Under-sampling on SVM ensembles (BNU-SVMs). The BNU-SVMs is under the framework of under-sampling ensemble method, where a sequence of SVMs is trained and the training dataset for each base SVM is selected by a Boosted Near-miss Under-sampling technique. More specifically, by adaptively updating weights over negative examples, the most near-miss negative examples in output space are selected in each iteration. Since the training dataset is balanced and reduced by under-sampling and the performance of classifier is improved by ensembles, the BNU-SVMs is a promising solution for large-scale and imbalance problem. Moreover, the negative examples selected by BNU-SVMs not only contain the most representative ones from data distribution perspective, but also cover the easily misclassified ones from data accuracy perspective. Therefore, the outperformance of the BNU-SVMs is expected. In addition, considering the computation cost caused by high-dimensional visual features, we proposed a kernel-distance pre-computation technique to further improve the efficiency of the BNU-SVMs. Experiments on TRECVID benchmark datasets show that the BNU-SVMs outperforms the previous methods significantly, which demonstrates that the BNU-SVMs is a both effective and efficient solution to concept detection in large-scale imbalanced datasets.
What problem does this paper attempt to address?