Abstract:We consider the minimization of a Lipschitz continuous and expectation-valued function defined as $f(\mathbf{x}) \triangleq \mathbb{E}[{\tilde f}(\mathbf{x}, \boldsymbol{\xi})]$, over a closed and convex set. Our focus lies on obtaining both asymptotics as well as rate and complexity guarantees for computing an approximate stationary point (in a Clarke sense) via zeroth-order schemes. We adopt a smoothing-based approach reliant on minimizing $f_{\eta}$ where $f_{\eta}(\mathbf{x}) = \mathbb{E}_{\mathbf{u}}[f(\mathbf{x}+\eta \mathbf{u})]$, $\mathbf{u}$ is a random variable defined on a unit sphere, and $\eta > 0$. It has been observed that a stationary point of the $\eta$-smoothed problem is a $2\eta$-stationary point for the original problem in the Clarke sense. In such a setting, we develop two sets of schemes with promising empirical behavior. (I) We develop a smoothing-enabled variance-reduced zeroth-order gradient framework (VRG-ZO) and make two sets of contributions for the sequence generated by the proposed zeroth-order gradient scheme. (a) The residual function of the smoothed problem tends to zero almost surely along the generated sequence, allowing for making guarantees for $\eta$-Clarke stationary solutions of the original problem; (b) To compute an $\mathbf{x}$ that ensures that the expected norm of the residual of the $\eta$-smoothed problem is within $\epsilon$ requires no greater than $O(\eta^{-1} \epsilon^{-2})$ projection steps and $ O\left(\eta^{-2} \epsilon^{-4}\right)$ function evaluations. (II) Our second scheme is a zeroth-order stochastic quasi-Newton scheme (VRSQN-ZO) reliant on a combination of randomized and Moreau smoothing; the corresponding iteration and sample complexities for this scheme are $ O\left(\eta^{-5}\epsilon^{-2}\right)$ and $ O\left(\eta^{-7}\epsilon^{-4}\right)$, respectively

Stochastic Zeroth Order Gradient and Hessian Estimators: Variance Reduction and Refined Bias Bounds

On Sharp Stochastic Zeroth-Order Hessian Estimators over Riemannian Manifolds

Stochastic Sub-Sampled Newton Method with Variance Reduction

Variance-Reduced Gradient Estimator for Nonconvex Zeroth-Order Distributed Optimization

Zeroth-order Gradient and Quasi-Newton Methods for Nonsmooth Nonconvex Stochastic Optimization

Stochastic Zeroth-order Optimization Via Variance Reduction Method.

Generalizing Stochastic Smoothing for Differentiation and Gradient Estimation

Statistical Inference for Polyak-Ruppert Averaged Zeroth-order Stochastic Gradient Algorithm

Variance-reduced first-order methods for deterministically constrained stochastic nonconvex optimization with strong convergence guarantees

First and zeroth-order implementations of the regularized Newton method with lazy approximated Hessians

Convergence Rates of Stochastic Zeroth-order Gradient Descent for \L ojasiewicz Functions

Zeroth-order Stochastic Cubic Newton Method with Low-rank Hessian Estimation

Unbiased least squares regression via averaged stochastic gradient descent

Small errors in random zeroth-order optimization are imaginary

Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function

Stochastic viscosity approximations of Hamilton-Jacobi equations and variance reduction

Zeroth-Order Hard-Thresholding: Gradient Error vs. Expansivity

Stochastic Second-order Methods for Non-convex Optimization with Inexact Hessian and Gradient

Stochastic Optimization for Nonconvex Problem with Inexact Hessian Matrix, Gradient, and Function

A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization

Stochastic Nested Variance Reduction for Nonconvex Optimization