Abstract: In this paper, we consider the problem of stochastic optimization, where the objective function is in terms of the expectation of a (possibly non-convex) cost function that is parametrized by a random variable. While the convergence speed is critical for many emerging applications, most existing stochastic optimization methods suffer from slow convergence. Furthermore, the emerging technology of parallel computing has motivated an increasing demand for designing new stochastic optimization schemes that can handle parallel optimization for implementation in distributed systems. We propose a fast parallel stochastic optimization framework that can solve a large class of possibly non-convex stochastic optimization problems that may arise in applications with multi-agent systems. In the proposed method, each agent updates its control variable in parallel, by solving a convex quadratic subproblem independently. The convergence of the proposed method to the optimal solution for convex problems and to a stationary point for general non-convex problems is established. The proposed algorithm can be applied to solve a large class of optimization problems arising in important applications from various fields, such as machine learning and wireless networks. As a representative application of our proposed stochastic optimization framework, we focus on large-scale support vector machines and demonstrate how our algorithm can efficiently solve this problem, especially in modern applications with huge datasets. Using popular real-world datasets, we present experimental results to demonstrate the merits of our proposed framework by comparing its performance to the state-of-the-art in the literature. Numerical results show that the proposed method can significantly outperform the state-of-the-art methods in terms of the convergence speed while having the same or lower complexity and storage requirement.

Efficient mini-batch training for stochastic optimization

Parallel Stochastic Optimization Framework for Large-Scale Non-Convex Stochastic Problems

Stochastic Proximal Gradient Algorithm with Minibatches. Application to Large Scale Learning Models

Mini-batch Quasi-Newton Optimization for Large Scale Linear Support Vector Regression

Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization

Batch Size Matters: A Diffusion Approximation Framework on Nonconvex Stochastic Gradient Descent.

A Hybrid Stochastic-Deterministic Minibatch Proximal Gradient Method for Efficient Optimization and Generalization.

Stability and Generalization for Minibatch SGD and Local SGD

Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization

The Mini-batch Stochastic Conjugate Algorithms with the unbiasedness and Minimized Variance Reduction

Accelerating Minibatch Stochastic Gradient Descent Using Typicality Sampling

Stagewise Accelerated Stochastic Gradient Methods for Nonconvex Optimization

Mini-batch stochastic subgradient for functional constrained optimization

Optimal Adaptive and Accelerated Stochastic Gradient Descent

Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex Optimization

Statistical Analysis of Fixed Mini-Batch Gradient Descent Estimator

Unlocking optimal batch size schedules using continuous-time control and perturbation theory

Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods

Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms