Exploiting Sample Diversity in Distributed Machine Learning Systems.

Zhiqiang Liu,Xuanhua Shi,Hai Jin
DOI: https://doi.org/10.1109/ccgrid.2016.75
2016-01-01
Abstract:With the increase of machine learning scalability, there is a growing need for distributed systems which can execute machine learning algorithms on large clusters. Currently, most distributed machine learning systems are developed based on iterative optimization algorithm and parameter server framework. However, most systems compute on all samples in every iteration and this method consumes too much computing resources since the amount of samples is always too large. In this paper, we study on the sample diversity and find that most samples ontribute little to model updating during most iterations. Based on these findings, we propose a new iterative optimization algorithm to reduce the computation load by reusing the iterative computing results. The experiment demonstrates that, compared to the current methods, the algorithm proposed in this paper can reduce about 23% of the whole computation load without increasing of communications.
What problem does this paper attempt to address?