HotML: A DSM-based Machine Learning System for Social Networks
Yangyang Zhang,Jianxin Li,Chenggen Sun,Md Zakirul Alam Bhuiyan,Weiren Yu,Richong Zhang
DOI: https://doi.org/10.1016/j.jocs.2017.09.006
IF: 3.817
2017-01-01
Journal of Computational Science
Abstract:In big data era, social networks, such as Twitter, Weibo, Facebook, are becoming more and more popular worldwide. To help social networks analysis, many machine learning (ML) algorithms have been adopted, e.g. user classification, link prediction, sentiment analysis, recommendations, etc. However, the dataset could be so large that it might take even days to train a model on a machine learning system. Performance issues should be considered to boost the training process. In this paper, we proposed HotML, a general machine learning system. HotML is designed in the parameter server (PS) architecture where the servers manage the globally shared parameters organized in tabular structure, and the workers compute the dataset in parallel and update the global parameters. HotML is based on our prior work DPS that provides high-level data abstraction, lightweight task scheduling system, and SSP consistency. HotML improved the DPS design by decoupling PS server and PS worker physically, and provides flexible consistency models including SSPPush, SSPDrop besides SSP, fault tolerance including consistent server-side checkpoint and flexible worker-side checkpoint, and workload balancing.To demonstrate the performance and scalability of the proposed system, a series of experiments are conducted and the results show that HotML can reduce networking time by about 74%, and achieve up to 1.9× performance compared to the popular ML system, Petuum.