Investigation on DirectSVM Algorithm for Support Vector Machine
Zhao Ying,Liu Hong-Xing,Gao Dun-Tang
DOI: https://doi.org/10.3321/j.issn:0469-5097.2006.04.006
2006-01-01
Abstract:Support vector machines(SVMs),which are based on the principle of structural risk minimization,are by far the most sophisticated and powerful classifiers available today.Training an SVM classifier is substantially solving a quadratic programming(QP) problem.Among those SVM training algorithms,Sequential Minimal Optimization and Nearest Point Algorithm are of much concern.Platt's Sequential Minimal Optimization algorithm is a fast iterative algorithm which divides the large scale QP problem into a series of small scale QP sub-problems,thus overcoming the difficulties of the original QP problem which needs enormous matrix storage and does expensive matrix operations.The NPA algorithm transforms a particular SVM classification formulation into a problem of calculating the nearest training samples between two closed convex polytopes in the hidden feature space formed by the two training sample sets.DirectSVM is a very simple iterative algorithm for constructing support vector machine classifiers,and it is most intuitive geometrically.The DirectSVM algorithm is based on the proposition that the two closest training points of the opposite class in a training set are support vectors.Other support vectors are found by using the following conjecture: the training point that maximally violates the current hyper-plane is also a support vector.The DirectSVM algorithm under linearly separable cases is as follows: first,the two nearest training samples of the opposite class are found to be the initial support vectors,and the corresponding original classification hyper-plane is obtained based on these two support vectors;then,the training point that maximally violates the current hyper-plane is found to be a new support vector,and the classification hyper-plane is modified accordingly;the support vector set and the hyper-plane are modified iteratively according to the classification,until no sample is more closer to the classification hyper-plane than those support vectors,and the optimal hyper plane is obtained finally.In this paper,the algorithm is tested with several instances,and the limitation of the algorithm is found.The causation of the limitation and the way to appropriately use the algorithm are discussed.The conclusion is that DirectSVM is not always reliable,but it can be used as an algorithm to construct the approximate optimal hyper-plane.It can also be used to search for the candidate support vector sets and to take the support vector sets as new training sample sets which can be trained by using other classical SVM algorithms such as SMO and NPA.