Lock-Free Parallelization for Variance-Reduced Stochastic Gradient Descent on Streaming Data

Yaqiong Peng,Zhiyu Hao,Xiaochun Yun
DOI: https://doi.org/10.1109/TPDS.2020.2987867
IF: 5.3
2020-01-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:Stochastic Gradient Descent (SGD) is an iterative algorithm for fitting a model to the training dataset in machine learning problems. With low computation cost, SGD is especially suited for learning from large datasets. However, the variance of SGD tends to be high because it uses only a single data point to determine the update direction at each iteration of gradient descent, rather than all available training data points. Recent research has proposed variance-reduced variants of SGD by incorporating a correction term to approximate full-data gradients. However, it is difficult to parallelize such variants with high performance and accuracy, especially on streaming data. As parallelization is a crucial requirement for large-scale applications, this article focuses on the parallel setting in a multicore machine and presents LFS-STRSAGA, a lock-free approach to parallelizing variance-reduced SGD on streaming data. LFS-STRSAGA embraces a lock-free data structure to process the arrival of streaming data in parallel, and asynchronously maintains the essential information to approximate full-data gradients with low cost. Both our theoretical and empirical results show that LFS-STRSAGA matches the accuracy of the state-of-the-art variance-reduced SGD on streaming data under sparsity assumption (common in machine learning problems), and that LFS-STRSAGA reduces the model update time by over 98 percent.
What problem does this paper attempt to address?