Abstract:Imbalanced data classification poses a major challenge in data mining community. Although standard support vector machine can generally show relatively robust performance in dealing with the classification problems of imbalanced data set, it is a typical overall accuracy-oriented algorithm which results in the final decision boundary biasing toward the majority class. Some ensemble methods have emerged as meta-techniques for improving the generalization performance of existing learning algorithms. In this paper, we propose a novel self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification. In the proposed approach, to guarantee the consistency of optimization objectives between weak learners and boosting scheme, we not only apply cost-sensitive SVMs as basic weak leaner but also simultaneously modify the standard boosting scheme to cost-sensitive ones. In order to ensure more training minority instances for successive classifiers, especially borderline minority instances, we also present a self-adaptive sequential misclassification cost weights determination method. The method can self-adaptively consider the different contribution of minority instances to the form of SVM classifiers at each iteration based on the preceding obtained classifier during boosting, which can allow it to produce diverse classifiers and thus improve its generalization performance. In the experiments, we analyze and discuss the effect of different parameters on the performance and some suggestions are also provided. The extensive experimental results on the different imbalanced datasets demonstrate that the proposed approach can achieve better generalization performance in terms of G-Mean and F-Measure as compared to the other existing imbalanced dataset classification techniques.

Learning Misclassification Costs for Imbalanced Classification on Gene Expression Data

A Method of Determining the Cost Weight with High-Dimensional Fitting

A method of classification accuracy calculation for cost sensitive algorithms

Cost Sensitive Support Vector Machines

SVM-Based Cost-sensitive Classification Algorithm with Error Cost and Class-dependent Reject Cost

SVM-based Cost Sensitive Mining

Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification

A Genetic Programming Method for Classifier Construction and Cost Learning in High-Dimensional Unbalanced Classification

Binary Classification Algorithm with Class-Dependent Reject Cost

Adaptive Weight Optimization for Classification of Imbalanced Data.

Using random forest for reliable classification and cost-sensitive learning for medical diagnosis

RUE: A Robust Personalized Cost Assignment Strategy for Class Imbalance Cost-sensitive Learning

The Influence of Class Imbalance on Cost-Sensitive Learning: an Empirical Study.

Adaptive Cost-Sensitive Learning in Neural Networks for Misclassification Cost Problems

A Statistical Approach to Cost-Sensitive AdaBoost for Imbalanced Data Classification

Cost-sensitive hierarchical classification for imbalance classes

An adaptive Cost-sensitive Classifier

Learning with cost intervals.

Instance-dependent misclassification cost-sensitive learning for default prediction

Multi-objective optimization-based adaptive class-specific cost extreme learning machine for imbalanced classification