The parallel algorithms for LIBSVM parameter optimization based on Spark
Kun Li,Peng Liu,Yajie Lv,Guopeng Zhang,Yihua Huang
DOI: https://doi.org/10.13232/j.cnki.jnju.2016.02.016
2016-01-01
Abstract:The purpose of this work is to design a parallel implementation of LIBSVM parameters optimization using Spark cluster.LIBSVM is a widely-used software package,which applies in models building,samples training,results predicting,etc.When LIBSVM is used to train data set,the choice of parameters,especially the parameter C and parameter g ,has a significant impact on the training results.In LIBSVM,the grid search algorithm is chosen to finish the optimization of combination of parameter C and parameter g ,which will run for a long time when the data volume reaches a certain degree,even though it is carried out in parallel manner on a single computer.In recent years,with the development of big data,cluster parallel computing and the emergence of in-memory computing platforms,such as Apache Spark,the efficiency of parameter optimization will be expected to increase dramatically when the parameter optimization is implemented in parallel manner on computing clusters.In this paper,we design and implement the parallelized parameter optimization algorithms of LIBSVM based on Spark parallel computing ar-chitecture.Experiment results show that the speed of parallelized parameter optimization by coarse-grained grid-search algorithm,proposed in this paper,is about 7 times as much as the serial one.And this improvement result will be further promoted with the expansion of the cluster scale.On the other hand,based on the coarse-grained grid-search algorithm,we achieve another improvement on the result of C and g parameter combination optimization, after the application of fine-grained parallel grid search algorithm.