PS2: Parameter Server on Spark

Zhipeng Zhang,Bin Cui,Yingxia Shao,Lele Yu,Jiawei Jiang,Xupeng Miao
DOI: https://doi.org/10.1145/3299869.3314038
2019-01-01
Abstract:Most of the data is extracted and processed by Spark in Tencent Machine Learning Platform. However, seldom of them use Spark MLlib, an official machine learning (ML) library on top of Spark due to its inefficiency. In contrast, systems like parameter servers, XGBoost and TensorFlow are more used, which incur expensive cost of transferring data in and out of Spark ecosystem. In this paper, we identify the causes of inefficiency in Spark MLlib and solve the problem by building parameter servers on top of Spark. We propose PS2, a parameter server architecture that integrates Spark without hacking the core of Spark. With PS2, we leverage the power of Spark for data processing and ML training, and parameter servers for maintaining ML models. By carefully analyzing Tencent ML workloads, we figure out a widely existing computation pattern for ML models---element-wise operations among multiple high dimensional vectors. Based on this observation, we propose a new data abstraction, called Dimension Co-located Vector (DCV) for efficient model management in PS2. A DCV is a distributed vector that considers locality in parameter servers and enables efficient computation with multiple co-located distributed vectors. For ease-of-use, we also design a wide variety of advanced operators for operating DCVs. Finally, we carefully implement the PS2 system and evaluate it against existing systems on both public and Tencent workloads. Empirical results demonstrate that PS2 can outperform Spark MLlib by up to 55.6X and specialized ML systems like Petuum by up to 3.7X.
What problem does this paper attempt to address?