Optimization of Multi Kernel Parallel Support Vector Machine Based on Hadoop
Wei Nie,Binwen Fan,Xiaomin Kong,Qianqian Ma
DOI: https://doi.org/10.1109/imcec.2016.7867488
2016-01-01
Abstract:With the advent of large numbers of data and a large number of samples, the traditional support vector machine algorithm is not applicable because of it's too much memory overhead and time overhead. Traditional parallel SVM based on MapReduce is to separate the train data into multiple sub-training sets on MapReduce-based model, these sub-datasets are trained by SVM, and then, get the support vectors and obtain a classification mode. It can reduce the training time, but, its classification accuracy needs to be improved. In addition, designing suitable kernel functions for a given problem is the core issue of support vector machine. Because a single kernel function has a fixed format and its changing space is relatively small, the generalization capability and robustness of single kernel function SVMs are limited. Therefore, in this paper, we propose a hybrid kernel function and a hybrid parallel support vector machine to improve the classification accuracy and reduce the training time. The experimental results show that this learning strategy improves the classification accuracy, reduces the training time, and speeds up the classification speed.