Mining Knowledge from Unbalanced Data: Effect of Class Distribution on SVM Classification

ZHENG En-hui,LI Ping,SONG Zhi-huan
DOI: https://doi.org/10.3969/j.issn.1002-0411.2005.06.013
2005-01-01
Information and Control
Abstract:Based on standard support vector machines(SVMs), the bound of both the support vector number(and rate) and boundary support vector number(and rate)is proposed and proved.Then the bounds are extended to positive class and negative class respectively.On the basis of the bounds,it is proved that the positive class yields poorer classification and predictive accuracy than the negative class does.Simulation results of both artificial data sets and benchmark data sets show that the conclusion and method in this paper is true and effective.
What problem does this paper attempt to address?