Methods for Class-Imbalanced Learning with Support Vector Machines: A Review and an Empirical Evaluation

Salim Rezvani,Farhad Pourpanah,Chee Peng Lim,Q. M. Jonathan Wu
2024-06-12
Abstract:This paper presents a review on methods for class-imbalanced learning with the Support Vector Machine (SVM) and its variants. We first explain the structure of SVM and its variants and discuss their inefficiency in learning with class-imbalanced data sets. We introduce a hierarchical categorization of SVM-based models with respect to class-imbalanced learning. Specifically, we categorize SVM-based models into re-sampling, algorithmic, and fusion methods, and discuss the principles of the representative models in each category. In addition, we conduct a series of empirical evaluations to compare the performances of various representative SVM-based models in each category using benchmark imbalanced data sets, ranging from low to high imbalanced ratios. Our findings reveal that while algorithmic methods are less time-consuming owing to no data pre-processing requirements, fusion methods, which combine both re-sampling and algorithmic approaches, generally perform the best, but with a higher computational load. A discussion on research gaps and future research directions is provided.
Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve the problems encountered by Support Vector Machines (SVMs) when dealing with class - imbalanced data. Specifically, class imbalance means that in a dataset, one or more target (majority) classes contain far more samples than the other (minority) classes. This imbalance can cause standard supervised classification algorithms to be biased towards the majority class, resulting in poor prediction performance for the minority class. ### Main Problems 1. **Decision Boundary Shift**: Since the number of majority - class samples is far greater than that of minority - class samples, SVMs tend to learn a decision boundary that is biased towards the majority class. This makes minority - class samples prone to misclassification. 2. **Over - fitting and Poor Generalization Ability**: When dealing with imbalanced data, SVMs may over - fit the majority - class samples, resulting in poor performance of the model on the test set, especially on minority - class samples. 3. **Waste of Computational Resources**: When the dataset is highly imbalanced, SVMs may spend a large amount of computational resources on processing majority - class samples while ignoring the importance of minority - class samples. ### Solutions To address these problems, the paper proposes the following methods: 1. **Re - sampling Methods**: - **Under - sampling**: Reduce the number of majority - class samples to make the dataset more balanced. - **Over - sampling**: Increase the number of minority - class samples, for example, by synthesizing new minority - class samples. - **Combined Methods**: Combine under - sampling and over - sampling to achieve better balance. 2. **Algorithmic Methods**: - **Different Error Costs (DEC)**: Set different penalty parameters \( C \) for majority - class and minority - class samples to increase the importance of minority - class samples. - **Kernel Modifications**: Adjust the kernel function of SVMs to make them more suitable for imbalanced data. 3. **Fusion Methods**: - **Hybrid Methods**: Combine multiple techniques, such as combining re - sampling with algorithmic improvements. - **Ensemble Methods**: Use multiple classifiers for ensemble learning to improve overall performance. ### Contributions of the Paper - **Comprehensive Review**: A detailed review of SVM - based class - imbalanced learning methods. - **Hierarchical Classification**: Propose a hierarchical classification framework that divides SVM - related methods into three categories: re - sampling, algorithmic improvements, and fusion methods. - **Empirical Evaluation**: Through a series of empirical evaluations, compare the performance of various methods on datasets with different imbalance ratios. - **Future Research Directions**: Discuss the gaps in existing research and point out future research directions. In conclusion, this paper aims to provide a comprehensive and in - depth perspective to help researchers better understand and solve the challenges faced by SVMs when dealing with class - imbalanced data.