A mixed integer linear programming support vector machine for cost-effective feature selection

In Gyu Lee,Qianqian Zhang,Sang Won Yoon,Daehan Won
DOI: https://doi.org/10.1016/j.knosys.2020.106145
IF: 8.139
2020-01-01
Knowledge-Based Systems
Abstract:In the era of big data, feature selection is indispensable as a dimensional reduction technique to lower data complexity and enhance machine learning performances. However, traditional feature selection methods mainly focus on classification performances, while they exclude the impact of associated feature costs; e.g., price, risk, and computational complexity for feature acquisition. In this research, we extend the ℓ1 norm support vector machine (ℓ1-SVM) to address the feature costs, by incorporating a budget constraint to preserve classification accuracy with the least expensive features. Furthermore, we formulate its robust counterpart to address the uncertainty of the feature costs. To enhance computational efficiency, we also develop an algorithm to tighten the bound of the weight vector in the budget constraint. Through the experimental study on a variety of benchmark and synthetic datasets, our proposed mixed integer linear programming (MILP) models show that they can achieve competitive outcomes in terms of predictive and economic performances. Also, the algorithm that tightens the budget constraint helps to curtail computational complexity.
What problem does this paper attempt to address?