Cost-sensitive probabilistic predictions for support vector machines

Sandra Benítez-Peña,Rafael Blanquero,Emilio Carrizosa,Pepa Ramírez-Cobo

DOI: https://doi.org/10.1016/j.ejor.2023.09.027

2023-10-09

Abstract:Support vector machines (SVMs) are widely used and constitute one of the best examined and used machine learning models for two-class classification. Classification in SVM is based on a score procedure, yielding a deterministic classification rule, which can be transformed into a probabilistic rule (as implemented in off-the-shelf SVM libraries), but is not probabilistic in nature. On the other hand, the tuning of the regularization parameters in SVM is known to imply a high computational effort and generates pieces of information that are not fully exploited, not being used to build a probabilistic classification rule. In this paper we propose a novel approach to generate probabilistic outputs for the SVM. The new method has the following three properties. First, it is designed to be cost-sensitive, and thus the different importance of sensitivity (or true positive rate, TPR) and specificity (true negative rate, TNR) is readily accommodated in the model. As a result, the model can deal with imbalanced datasets which are common in operational business problems as churn prediction or credit scoring. Second, the SVM is embedded in an ensemble method to improve its performance, making use of the valuable information generated in the parameters tuning process. Finally, the probabilities estimation is done via bootstrap estimates, avoiding the use of parametric models as competing approaches. Numerical tests on a wide range of datasets show the advantages of our approach over benchmark procedures.

Machine Learning

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address some key issues of Support Vector Machines (SVM) in classification tasks, particularly how to generate probabilistic outputs and how to handle imbalanced datasets. #### Main Contributions 1. **Cost-Sensitive Probabilistic Prediction**: - A new method is proposed to generate probabilistic outputs for SVMs, which is cost-sensitive. This means the model can be adjusted according to the importance of different classes (e.g., sensitivity or specificity). - This allows the model to better handle imbalanced datasets, such as business problems like customer churn prediction or credit scoring. 2. **Improvement of Ensemble Methods**: - SVM is embedded into an ensemble method to improve performance and utilize valuable information generated during parameter tuning. 3. **Non-Parametric Probability Estimation**: - Probability estimation is performed through the bootstrap method, avoiding the use of parametric models, thereby improving prediction accuracy. #### Practical Problems Solved - In many practical applications, such as customer churn prediction, credit scoring, or medical diagnosis, imbalanced datasets are very common. Traditional methods often do not handle this situation well. - By introducing cost sensitivity, the accuracy of predicting minority classes can be significantly improved, thereby reducing the cost losses caused by errors. #### Experimental Validation - The paper validates the effectiveness of the proposed methods through experiments on multiple real datasets and compares them with existing benchmark methods. - Results show that the new method has a significant advantage in probabilistic prediction when dealing with imbalanced datasets. In summary, this paper aims to improve the classification performance of SVMs on imbalanced datasets by introducing cost sensitivity and non-parametric probability estimation methods, thereby providing better prediction results in practical business scenarios.

Cost-sensitive probabilistic predictions for support vector machines

Support Vector Machines Ensemble With Optimizing Weights By Genetic Algorithm

On support vector machines under a multiple-cost scenario

Cost Sensitive Support Vector Machines

SVM-Based Cost-sensitive Classification Algorithm with Error Cost and Class-dependent Reject Cost

SVM-based Cost Sensitive Mining

Probabilistic Classification Vector Machines.

Non-Parametric Estimation Of Svm Probabilistic Outputs And Its Application On Handwritten Digital Recognition

Cost-sensitive Feature Selection for Support Vector Machines

Probabilistic Safety Regions Via Finite Families of Scalable Classifiers

Probabilistic support vector machine output adjusting for sampling bias

Hybrid SVM algorithm oriented to classifying imbalanced datasets

Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers

Sparse Learning and Class Probability Estimation with Weighted Support Vector Machines

Financial time series forecasting using support vector machines

Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification

Practical Bayesian support vector regression for financial time series prediction and market condition change detection

Predicting Criminal Recidivism with Support Vector Machine

Weighted Posterior Probability Output for Support Vector Machines

Automatic optimized support vector regression for financial data prediction

Boosting Support Vector Machines Successfully.