Fast SVM training using edge detection on very large datasets
Boyang Li,Qiangwei Wang,Jinglu Hu
DOI: https://doi.org/10.1002/tee.21844
IF: 0.923
2013-04-10
IEEJ Transactions on Electrical and Electronic Engineering
Abstract:In a standard support vector machine (SVM), the training process has O(n3) time and O(n2) space complexities, where n is the size of the training dataset. For very large datasets, it is thus computationally infeasible. Reducing the size of training dataset is naturally considered as a method to solve this problem. SVM classifiers are constructed by using the training samples called support vectors (SVs) that lie close to the separation boundary. Thus, removing the other samples that are not relevant to SVs might have no effect on building the separation boundary. In other words, we need to reserve the samples that are likely to be SVs. Therefore, a method based on edge detection techniques is proposed to extract such samples near the separation boundary. In order to avoid overfitting, we also use a clustering algorithm to keep the distribution properties of the training dataset. The samples selected by the edge detector and the centroids of clusters are used to reconstruct the training dataset. In the proposed approach, the edge detection technique helps us to extract the local properties around the separation boundary and the clustering algorithm preserves the properties of the entire data. The reconstructed training dataset with a smaller number of samples can make the training process very fast without degrading the classification accuracy. © 2013 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.
engineering, electrical & electronic