Abstract:Automatic concept learning from large scale imbalanced data sets is a key issue in video semantic analysis and retrieval, which means the number of negative examples is far more than that of positive examples for each concept in the training data. The existing methods adopt generally under-sampling for the majority negative examples or over-sampling for the minority positive examples to balance the class distribution on training data. The main drawbacks of these methods are: (1) As a key factor that affects greatly the performance, in most existing methods, the degree of re-sampling needs to be pre-fixed, which is not generally the optimal choice; (2) Many useful negative samples may be discarded in under-sampling. In addition, some works only focus on the improvement of the computational speed, rather than the accuracy. To address the above issues, we propose a new approach and algorithm named AdaOUBoost (Adaptive Over-sampling and Under-sampling Boost). The novelty of AdaOUBoost mainly lies in: adaptively over-sample the minority positive examples and under-sample the majority negative examples to form different sub-classifiers. And combine these sub-classifiers according to their accuracy to create a strong classifier, which aims to use fully the whole training data and improve the performance of the class-imbalance learning classifier. In AdaOUBoost, first, our clustering-based under-sampling method is employed to divide the majority negative examples into some disjoint subsets. Then, for each subset of negative examples, we utilize the borderline-SMOTE (synthetic minority over-sampling technique) algorithm to over-sample the positive examples with different size, train each sub-classifier using each of them, and get the classifier by fusing these sub-classifiers with different weights. Finally, we combine these classifiers in each subset of negative examples to create a strong classifier. We compare the performance between AdaOUBoost and the state-of-the-art methods on TRECVID 2008 benchmark with all 20 concepts, and the results show the AdaOUBoost can achieve the superior performance in large scale imbalanced data sets.

Boosted Near-miss Under-sampling on SVM ensembles for concept detection in large-scale imbalanced datasets

Imbalanced Data Sets Classification Method Based on Over-Sampling Technique

A Novel Svm Modeling Approach For Highly Imbalanced And Overlapping Classification

An Unbalanced Dataset Classification Approach Based On V-Support Vector Machine

AdaOUBoost: adaptive over-sampling and under-sampling to boost the concept learning in large scale imbalanced data sets.

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

Learning concepts from large scale imbalanced data sets using support cluster machines.

Several SVM Ensemble Methods Integrated with Under-Sampling for Imbalanced Data Learning

CNUSVM: Hybrid CNN-Uneven SVM Model for Imbalanced Visual Learning

An efficient concept detection system via sparse ensemble learning.

Two-step ensemble under-sampling algorithm for massive imbalanced data classification

Mining Knowledge from Unbalanced Data Based on Ν-Support Vector Machine

A DBN-based resampling SVM ensemble learning paradigm for credit classification with imbalanced data

Semisupervised SVM Batch Mode Active Learning with Applications to Image Retrieval

The Ensemble of Density-Sensitive SVDD Classifier Based on Maximum Soft Margin for Imbalanced Datasets.

Incorporating feature hierarchy and boosting to achieve more effective classifier training and concept-oriented video summarization and skimming

Combining Boundary Detector and SND-SVM for Fast Learning.

Video Concept Detection Using Support Vector Machines - TRECVID 2007 Evaluations

A Positive Sample Enhancement Algorithm with Fuzzy Nearest Neighbor Hybridization for Imbalance Data

Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification