Abstract:This paper presents a review on methods for class-imbalanced learning with the Support Vector Machine (SVM) and its variants. We first explain the structure of SVM and its variants and discuss their inefficiency in learning with class-imbalanced data sets. We introduce a hierarchical categorization of SVM-based models with respect to class-imbalanced learning. Specifically, we categorize SVM-based models into re-sampling, algorithmic, and fusion methods, and discuss the principles of the representative models in each category. In addition, we conduct a series of empirical evaluations to compare the performances of various representative SVM-based models in each category using benchmark imbalanced data sets, ranging from low to high imbalanced ratios. Our findings reveal that while algorithmic methods are less time-consuming owing to no data pre-processing requirements, fusion methods, which combine both re-sampling and algorithmic approaches, generally perform the best, but with a higher computational load. A discussion on research gaps and future research directions is provided.

What problem does this paper attempt to address?

This paper attempts to solve the problems encountered by Support Vector Machines (SVMs) when dealing with class - imbalanced data. Specifically, class imbalance means that in a dataset, one or more target (majority) classes contain far more samples than the other (minority) classes. This imbalance can cause standard supervised classification algorithms to be biased towards the majority class, resulting in poor prediction performance for the minority class. ### Main Problems 1. **Decision Boundary Shift**: Since the number of majority - class samples is far greater than that of minority - class samples, SVMs tend to learn a decision boundary that is biased towards the majority class. This makes minority - class samples prone to misclassification. 2. **Over - fitting and Poor Generalization Ability**: When dealing with imbalanced data, SVMs may over - fit the majority - class samples, resulting in poor performance of the model on the test set, especially on minority - class samples. 3. **Waste of Computational Resources**: When the dataset is highly imbalanced, SVMs may spend a large amount of computational resources on processing majority - class samples while ignoring the importance of minority - class samples. ### Solutions To address these problems, the paper proposes the following methods: 1. **Re - sampling Methods**: - **Under - sampling**: Reduce the number of majority - class samples to make the dataset more balanced. - **Over - sampling**: Increase the number of minority - class samples, for example, by synthesizing new minority - class samples. - **Combined Methods**: Combine under - sampling and over - sampling to achieve better balance. 2. **Algorithmic Methods**: - **Different Error Costs (DEC)**: Set different penalty parameters \( C \) for majority - class and minority - class samples to increase the importance of minority - class samples. - **Kernel Modifications**: Adjust the kernel function of SVMs to make them more suitable for imbalanced data. 3. **Fusion Methods**: - **Hybrid Methods**: Combine multiple techniques, such as combining re - sampling with algorithmic improvements. - **Ensemble Methods**: Use multiple classifiers for ensemble learning to improve overall performance. ### Contributions of the Paper - **Comprehensive Review**: A detailed review of SVM - based class - imbalanced learning methods. - **Hierarchical Classification**: Propose a hierarchical classification framework that divides SVM - related methods into three categories: re - sampling, algorithmic improvements, and fusion methods. - **Empirical Evaluation**: Through a series of empirical evaluations, compare the performance of various methods on datasets with different imbalance ratios. - **Future Research Directions**: Discuss the gaps in existing research and point out future research directions. In conclusion, this paper aims to provide a comprehensive and in - depth perspective to help researchers better understand and solve the challenges faced by SVMs when dealing with class - imbalanced data.

Methods for Class-Imbalanced Learning with Support Vector Machines: A Review and an Empirical Evaluation

Methods for class-imbalanced learning with support vector machines: a review and an empirical evaluation

A Novel Svm Modeling Approach For Highly Imbalanced And Overlapping Classification

Imbalanced Data Sets Classification Method Based on Over-Sampling Technique

Weighted Support Vector Machine for Classification with Uneven Training Class Sizes

A broad review on class imbalance learning techniques

An Unbalanced Dataset Classification Approach Based On V-Support Vector Machine

Instance Importance Based SVM for Solving Imbalanced Data Classification

Imbalanced Data Classification Algorithm Based on Integrated Sampling and Ensemble Learning.

A new improved support vector machine: QGA-SVM

Hybrid SVM algorithm oriented to classifying imbalanced datasets

Improved SVM algorithm for imbalanced dataset classification

Learning concepts from large scale imbalanced data sets using support cluster machines.

Research and Analysis of Methods for Multiclass Support Vector Machines

A Classfication Method For Imbalance Data Set Based on Kernel SMOTE

[Spontaneous vertebral degeneration of old age].

Review of Classification Methods on Unbalanced Data Sets

Mining Knowledge from Unbalanced Data: Effect of Class Distribution on SVM Classification

Value-Aware Resampling and Loss for Imbalanced Classification

Mining Knowledge from Unbalanced Data Based on Ν-Support Vector Machine

Learning in presence of class imbalance and class overlapping by using one-class SVM and undersampling technique