Mining Knowledge from Unbalanced Data Based on Ν-Support Vector Machine

ZHENG En-hui,XU Hong,LI Ping,SONG Zhi-huan
DOI: https://doi.org/10.3785/j.issn.1008-973x.2006.10.005
2006-01-01
Abstract:Learning from data sets that contain very few instances of the positive class usually produces biased classifiers that have a higher predictive accuracy over the negative class than over the positive class(usually the more important or interesting class).Based on ν-SVM and its elicitation,the bounds of both the SV number and BSV number were proposed.Further,the bounds of both the SV rate and BSV rate were presented and attested.Then the bounds mentioned above were extended to positive class and negative class respectively.Finally,it was proved that the SV rate and BSV rate of positive class are higher than those of negative class and that the positive class yields poorer classification and predictive accuracy than the negative class does.Experimental study based on German credit and Heart disease data sets showed that the hypothesis is reasonable and the conclusions are correct.
What problem does this paper attempt to address?