Abstract:Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions and large datasets. We address the bottleneck problem arising when using both shared and distributed memory. Typically, the former is bounded by limited computation resources and bandwidth whereas the latter suffers from communication overheads. We propose a unified distributed and parallel implementation of SGD (named DPSGD) that relies on both asynchronous distribution and lock-free parallelism. By combining two strategies into a unified framework, DPSGD is able to strike a better trade-off between local computation and communication. The convergence properties of DPSGD are studied for non-convex problems such as those arising in statistical modelling and machine learning. Our theoretical analysis shows that DPSGD leads to speed-up with respect to the number of cores and number of workers while guaranteeing an asymptotic convergence rate of given that the number of cores is bounded by and the number of workers is bounded by where T is the number of iterations. The potential gains that can be achieved by DPSGD are demonstrated empirically on a stochastic variational inference problem (Latent Dirichlet Allocation) and on a deep reinforcement learning (DRL) problem (advantage actor critic - A2C) resulting in two algorithms: DPSVI and HSA2C. Empirical results validate our theoretical findings. Comparative studies are conducted to show the performance of the proposed DPSGD against the state-of-the-art DRL algorithms.

A Fast Distributed Stochastic Gradient Descent Algorithm for Matrix Factorization.

An Efficient Approach of GPU-accelerated Stochastic Gradient Descent Method for Matrix Factorization

Gpusgd: A Gpu-Accelerated Stochastic Gradient Descent Algorithm for Matrix Factorization

A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems

CuMF_SGD: Parallelized Stochastic Gradient Descent for Matrix Factorization on GPUs.

Large-scale and Scalable Latent Factor Analysis via Distributed Alternative Stochastic Gradient Descent for Recommender Systems

Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC

A Distributed Coordinate Descent Algorithm for Learning Factorization Machine

A Novel Stochastic Gradient Descent Algorithm Based on Grouping over Heterogeneous Cluster Systems for Distributed Deep Learning

FastSGD: A Fast Compressed SGD Framework for Distributed Machine Learning

CuMF_SGD: Fast and Scalable Matrix Factorization.

Distributed Stochastic ADMM for Matrix Factorization.

Byzantine-Robust Stochastic Gradient Descent for Distributed Low-Rank Matrix Completion

Cdsfm: A Circular Distributed Sgld-Based Factorization Machines

Scaling up stochastic gradient descent for non-convex optimisation

Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning

DCF: A Dataflow-Based Collaborative Filtering Training Algorithm

Efficient Distributed Stochastic Gradient Descent Through Gaussian Averaging.

Stochastic Gradient Descent for matrix completion: Hybrid parallelization on shared- and distributed-memory systems

High Performance Coordinate Descent Matrix Factorization for Recommender Systems.

Asynchronous Accelerated Stochastic Gradient Descent.