DPS: A DSM-based Parameter Server for Machine Learning

Chenggen Sun,Yangyang Zhang,Weiren Yu,Richong Zhang,Md. Zakirul Alam Bhuiyan,Jianxin Li
DOI: https://doi.org/10.1109/ISPAN-FCST-ISCC.2017.48
2017-01-01
Abstract:To solve the problem of efficient storing and updating of model parameters in the learning process, the parameter server is concerned as a high-throughput distributed machine learning (ML) architecture with the emergence of big models with billions of parameters. Current parameter servers, such as the Parameter Server and the Petuum, do not address data management and lack high-level data abstraction. Moreover, they have no task scheduling and do not fully utilize the computing resource as well as possibly lead to load imbalance. Their programming interface is too complicated and they do not support data flow operations (e.g. map/reduce) which are very useful for data preprocessing. These drawbacks limit the performance and ease of use of such parameter servers.In this paper, we proposed DPS, a parameter server based on Distributed Shared Memory (DSM) for machine learning. DPS provides flexible consistency models, high-level data abstraction and management that support data flow operations, lightweight task scheduling system and user-friendly programming interface to solve the problems of existing systems mentioned above. The experimental results show that DPS can reduce networking time by about 50%, and achieve up to 1.9x performance compared to Petuum while the algorithms implemented on DPS use less code than those implemented on Petuum. In this paper, we proposed DPS, a parameter server based on Distributed Shared Memory (DSM) for machine learning. DPS provides flexible consistency models, high-level data abstraction and management that support data flow operations, lightweight task scheduling system and user-friendly programming interface to solve the problems of existing systems mentioned above. The experimental results show that DPS can reduce networking time by about 50%, and achieve up to 1.9x performance compared to Petuum while the algorithms implemented on DPS use less code than those implemented on Petuum.
What problem does this paper attempt to address?