Abstract:We present a novel gradient-free algorithm to solve a convex stochastic optimization problem, such as those encountered in medicine, physics, and machine learning (e.g., adversarial multi-armed bandit problem), where the objective function can only be computed through numerical simulation, either as the result of a real experiment or as feedback given by the function evaluations from an adversary. Thus we suppose that only a black-box access to the function values of the objective is available, possibly corrupted by adversarial noise: deterministic or stochastic. The noisy setup can arise naturally from modeling randomness within a simulation or by computer discretization, or when exact values of function are forbidden due to privacy issues, or when solving non-convex problems as convex ones with an inexact function oracle. By exploiting higher-order smoothness, fulfilled, e.g., in logistic regression, we improve the performance of zero-order methods developed under the assumption of classical smoothness (or having a Lipschitz gradient). The proposed algorithm enjoys optimal oracle complexity and is designed under an overparameterization setup, i.e., when the number of model parameters is much larger than the size of the training dataset. Overparametrized models fit to the training data perfectly while also having good generalization and outperforming underparameterized models on unseen data. We provide convergence guarantees for the proposed algorithm under both types of noise. Moreover, we estimate the maximum permissible adversarial noise level that maintains the desired accuracy in the Euclidean setup, and then we extend our results to a non-Euclidean setup. Our theoretical results are verified on the logistic regression problem.

Laplacian smoothing gradient descent

Laplacian Smoothing Gradient Descent

Optimizing $(L_0, L_1)$-Smooth Functions by Gradient Methods

Laplacian Smoothing Stochastic Gradient Markov Chain Monte Carlo

A Deterministic Gradient-Based Approach to Avoid Saddle Points

Gaussian smoothing gradient descent for minimizing functions (GSmoothGD)

Generalizing Stochastic Smoothing for Differentiation and Gradient Estimation

Non-Uniform Smoothness for Gradient Descent

Predictive Local Smoothness for Stochastic Gradient Methods

Accelerated zero-order SGD under high-order smoothness and overparameterized regime

Smooth over-parameterized solvers for non-smooth structured optimization

Improved Performance of Stochastic Gradients with Gaussian Smoothing

Diagonalisation SGD: Fast & Convergent SGD for Non-Differentiable Models via Reparameterisation and Smoothing

Local smoothness in variance reduced optimization

Stochastic Gradient Descent in the Viewpoint of Graduated Optimization

Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

Smoothing $\mathcal{L}^2$ gradients in iterative regularization

Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization

Langevin Dynamics: A Unified Perspective on Optimization via Lyapunov Potentials

Random Smoothing Regularization in Kernel Gradient Descent Learning

Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions