Probabilistic support vector machine output adjusting for sampling bias

Chunyu Yang,Jie Zhou
2008-01-01
Abstract:In some real-world classification tasks, the classifier may be trained on a data set which does not reflect the class distribution of the real data set. Such sampling bias or virtual concept drift may seriously affect the classification accuracy. Previous researches on this topic mainly concern classifiers with explicit a posteriori probabilities output. There has been a framework to adjust the original classifier using Expectation Maximization (EM) algorithm for such classifiers. The margin based classifier Support Vector Machine (SVM), has not been studied under this framework because of the lack of probabilistic output. In this paper, we discuss the probabilistic output of SVM and propose a Gaussian Mixture Model (GMM) to approximate the class conditional distribution of the margin so as to adjust the classifier using the EM framework. Experimental results on standard machine learning data sets show that the proposed algorithm can improve the classification accuracy on most of these problems. It performs especially well on those data sets with low classification accuracy.
What problem does this paper attempt to address?