Abstract: In this paper, we consider the problem of stochastic optimization, where the objective function is in terms of the expectation of a (possibly non-convex) cost function that is parametrized by a random variable. While the convergence speed is critical for many emerging applications, most existing stochastic optimization methods suffer from slow convergence. Furthermore, the emerging technology of parallel computing has motivated an increasing demand for designing new stochastic optimization schemes that can handle parallel optimization for implementation in distributed systems. We propose a fast parallel stochastic optimization framework that can solve a large class of possibly non-convex stochastic optimization problems that may arise in applications with multi-agent systems. In the proposed method, each agent updates its control variable in parallel, by solving a convex quadratic subproblem independently. The convergence of the proposed method to the optimal solution for convex problems and to a stationary point for general non-convex problems is established. The proposed algorithm can be applied to solve a large class of optimization problems arising in important applications from various fields, such as machine learning and wireless networks. As a representative application of our proposed stochastic optimization framework, we focus on large-scale support vector machines and demonstrate how our algorithm can efficiently solve this problem, especially in modern applications with huge datasets. Using popular real-world datasets, we present experimental results to demonstrate the merits of our proposed framework by comparing its performance to the state-of-the-art in the literature. Numerical results show that the proposed method can significantly outperform the state-of-the-art methods in terms of the convergence speed while having the same or lower complexity and storage requirement.

Adaptive step size rules for stochastic optimization in large-scale learning

Barzilai-Borwein Step Size for Stochastic Gradient Descent

Parallel Stochastic Optimization Framework for Large-Scale Non-Convex Stochastic Problems

Accelerated Stochastic ADMM with Variance Reduction

A new inexact stochastic recursive gradient descent algorithm with Barzilai–Borwein step size in machine learning

A stochastic variance reduced gradient method with adaptive step for stochastic optimization

Barzilai-Borwein-based Adaptive Learning Rate for Deep Learning

Variable Metric Proximal Gradient Method with Diagonal Barzilai-Borwein Stepsize

A variable metric mini-batch proximal stochastic recursive gradient algorithm with diagonal Barzilai-Borwein stepsize

Gradient Methods with Adaptive Step-Sizes

Accelerating Mini-batch SARAH by Step Size Rules

Adaptive learning rate optimization algorithms with dynamic bound based on Barzilai-Borwein method

AdaBB: Adaptive Barzilai-Borwein Method for Convex Optimization

Learning the Step-size Policy for the Limited-Memory Broyden-Fletcher-Goldfarb-Shanno Algorithm

Regularized Barzilai-Borwein method

Stagewise Accelerated Stochastic Gradient Methods for Nonconvex Optimization

On the acceleration of the Barzilai-Borwein method

Painless Stochastic Conjugate Gradient for Large-Scale Machine Learning

Adaptive Coordinate-Wise Step Sizes for Quasi-Newton Methods: A Learning-to-Optimize Approach

Stochastic Ratios Tracking Algorithm for Large Scale Machine Learning Problems

Adaptive step size selection in distributed optimization with observation noise and unknown stochastic target variation