Abstract:When the nonconvex problem is complicated by stochasticity, the sample complexity of stochastic first-order methods may depend linearly on the problem dimension, which is undesirable for large-scale problems. In this work, we propose dimension-insensitive stochastic first-order methods (DISFOMs) to address nonconvex optimization with expected-valued objective function. Our algorithms allow for non-Euclidean and non-smooth distance functions as the proximal terms. Under mild assumptions, we show that DISFOM using minibatches to estimate the gradient enjoys sample complexity of $ \mathcal{O} ( (\log d) / \epsilon^4 ) $ to obtain an $\epsilon$-stationary point. Furthermore, we prove that DISFOM employing variance reduction can sharpen this bound to $\mathcal{O} ( (\log d)^{2/3}/\epsilon^{10/3} )$, which perhaps leads to the best-known sample complexity result in terms of $d$. We provide two choices of the non-smooth distance functions, both of which allow for closed-form solutions to the proximal step. Numerical experiments are conducted to illustrate the dimension insensitive property of the proposed frameworks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in high - dimensional non - convex stochastic optimization, the sample complexity of existing stochastic first - order methods (S - FOMs) may grow linearly with the problem dimension, which is not ideal when dealing with large - scale problems. Specifically, when non - convex problems are affected by randomness, the sample complexity of S - FOMs may increase linearly with the problem dimension $d$, which is very unfavorable for large - scale problems. To meet this challenge, the authors propose **dimension - independent stochastic first - order methods (DISFOMs)** to solve non - convex optimization problems with expected - value objective functions. These algorithms allow the use of non - Euclidean and non - smooth distance functions as proximal terms. Under mild assumptions, the authors prove that DISFOMs using mini - batch gradient estimates can obtain an $\epsilon$-stable point within a sample complexity of $O\left(\frac{\log d}{\epsilon^4}\right)$. Moreover, by using variance - reduction techniques, this complexity can be further improved to $O\left(\frac{(\log d)^{2/3}}{\epsilon^{10/3}}\right)$, which may be the best - known sample complexity result regarding dimension $d$ so far. ### Key Contributions 1. **Propose DISFOMs**: Allow the use of non - Euclidean and non - smooth distance functions to construct proximal terms. 2. **Sample Complexity Analysis**: - When using mini - batch gradient estimates, the sample complexity is $O\left(\frac{\log d}{\epsilon^4}\right)$. - When using variance - reduction techniques, the sample complexity is $O\left(\frac{(\log d)^{2/3}}{\epsilon^{10/3}}\right)$. 3. **Theoretical Guarantees**: Under mild assumptions, prove the convergence and sample complexity of DISFOMs. 4. **Numerical Experiments**: Demonstrate the dimension - independent nature of the proposed framework in high - dimensional problems and compare it with existing popular algorithms. ### Mathematical Symbols and Formulas - **Objective Function**: \[ f(x)=\mathbb{E}[F(x,\zeta)] \] - **Constraints**: \[ x\in X \] - **Residual Function**: \[ r(\bar{x})\triangleq\text{dist}_{\|\cdot\|_\infty}(0,\partial(f + \delta_X)(\bar{x})) \] - **Sample Complexity**: - Mini - batch gradient estimates: \[ O\left(\frac{\log d}{\epsilon^4}\right) \] - Variance - reduction techniques: \[ O\left(\frac{(\log d)^{2/3}}{\epsilon^{10/3}}\right) \] ### Conclusion This paper provides an effective solution by proposing DISFOMs, which can achieve dimension - independent sample complexity in high - dimensional non - convex stochastic optimization problems. This provides new theoretical and practical tools for dealing with large - scale optimization problems.

Stochastic First-Order Methods with Non-smooth and Non-Euclidean Proximal Terms for Nonconvex High-Dimensional Stochastic Optimization

Variance-reduced first-order methods for deterministically constrained stochastic nonconvex optimization with strong convergence guarantees

Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization

Stochastic Optimization for Non-convex Inf-Projection Problems

Riemannian Stochastic Proximal Gradient Methods for Nonsmooth Optimization over the Stiefel Manifold

An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization

O(log T) Projections for Stochastic Optimization of Smooth and Strongly Convex Functions

New nonasymptotic convergence rates of stochastic proximal pointalgorithm for convex optimization problems

The Stochastic Proximal Distance Algorithm

Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization

First-Order Methods for Nonsmooth Nonconvex Functional Constrained Optimization with or without Slater Points

Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex Optimization

A Stochastic Quasi-Newton Method for Non-convex Optimization with Non-uniform Smoothness

Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity

High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise

First-order Methods for Affinely Constrained Composite Non-convex Non-smooth Problems: Lower Complexity Bound and Near-optimal Methods

On the Complexity of Finite-Sum Smooth Optimization under the Polyak-Łojasiewicz Condition

Stochastic Differential Equations for Modeling First Order Optimization Methods

Empirical Risk Minimization for Stochastic Convex Optimization: $O(1/n)$- and $O(1/n^2)$-Type of Risk Bounds.

An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum Optimization