Abstract: We propose and analyze several stochastic gradient algorithms for finding stationary points or local minimum in nonconvex, possibly with nonsmooth regularizer, finite-sum and online optimization problems. First, we propose a simple proximal stochastic gradient algorithm based on variance reduction called ProxSVRG+. We provide a clean and tight analysis of ProxSVRG+, which shows that it outperforms the deterministic proximal gradient descent (ProxGD) for a wide range of minibatch sizes, hence solves an open problem proposed in Reddi et al. (2016b). Also, ProxSVRG+ uses much less proximal oracle calls than ProxSVRG (Reddi et al., 2016b) and extends to the online setting by avoiding full gradient computations. Then, we further propose an optimal algorithm, called SSRGD, based on SARAH (Nguyen et al., 2017) and show that SSRGD further improves the gradient complexity of ProxSVRG+ and achieves the optimal upper bound, matching the known lower bound of (Fang et al., 2018; Li et al., 2021). Moreover, we show that both ProxSVRG+ and SSRGD enjoy automatic adaptation with local structure of the objective function such as the Polyak-\L{}ojasiewicz (PL) condition for nonconvex functions in the finite-sum case, i.e., we prove that both of them can automatically switch to faster global linear convergence without any restart performed in prior work ProxSVRG (Reddi et al., 2016b). Finally, we focus on the more challenging problem of finding an $(\epsilon, \delta)$-local minimum instead of just finding an $\epsilon$-approximate (first-order) stationary point (which may be some bad unstable saddle points). We show that SSRGD can find an $(\epsilon, \delta)$-local minimum by simply adding some random perturbations. Our algorithm is almost as simple as its counterpart for finding stationary points, and achieves similar optimal rates.

Byzantine-robust decentralized stochastic optimization with stochastic gradient noise-independent learning error

Asynchronous Byzantine-Robust Stochastic Aggregation with Variance Reduction for Distributed Learning

Byzantine-Robust Loopless Stochastic Variance-Reduced Gradient

Prox-DBRO-VR: A Unified Analysis on Decentralized Byzantine-Resilient Composite Stochastic Optimization with Variance Reduction and Non-Asymptotic Convergence Rates

Efficient Byzantine-Resilient Stochastic Gradient Desce

Byzantine-resilient Decentralized Stochastic Gradient Descent

Byzantine-Resilient Non-Convex Stochastic Gradient Descent

Decentralized Stochastic Proximal Gradient Descent with Variance Reduction over Time-varying Networks

On the Tradeoff between Privacy Preservation and Byzantine-Robustness in Decentralized Learning

Communication-Efficient and Byzantine-Robust Distributed Stochastic Learning with Arbitrary Number of Corrupted Workers

Stochastic Optimization with Non-stationary Noise

Asynchronous Decentralized Accelerated Stochastic Gradient Descent

Decentralized Stochastic Subgradient Methods for Nonsmooth Nonconvex Optimization

Decentralized Stochastic Optimization with Inherent Privacy Protection

Stochastic learning via optimizing the variational inequalities.

Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex Optimization

Resilient Two-Time-Scale Local Stochastic Gradient Descent for Byzantine Federated Learning

Generalization Error Matters in Decentralized Learning Under Byzantine Attacks

Asynchronous Stochastic Proximal Methods for Nonconvex Nonsmooth Optimization.

On the Communication Complexity of Decentralized Bilevel Optimization

A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization.