Parallelizing Big Data Machine Learning Applications With Model Rotation

Bingjing Zhang,Bo Peng,Judy Qiu
DOI: https://doi.org/10.3233/978-1-61499-816-7-199
2017-01-01
Abstract:This paper proposes model rotation as a general approach to parallelize big data machine learning applications. To solve the big model problem in parallelization, we distribute the model parameters to inter-node workers and rotate different model parts in a ring topology. The advantage of model rotation comes from maximizing the effect of parallel model updates for algorithm convergence while minimizing the overhead of communication. We formulate a solution using computation models, programming interfaces, and system implementations as design principles and derive a machine learning framework with three algorithms built on top of it: Latent Dirichlet Allocation using Collapsed Gibbs Sampling, Matrix Factorization using Stochastic Gradient Descent and Cyclic Coordinate Descent. The performance results on an Intel Haswell cluster with max 60 nodes show that our solution achieves faster model convergence speed and higher scalability than previous work by others.
What problem does this paper attempt to address?