Abstract:This work considers the nonconvex, nonsmooth problem of minimizing a composite objective of the form $f(g(x))+h(x)$ where the inner mapping $g$ is a smooth finite summation or expectation amenable to variance reduction. In such settings, prox-linear methods can enjoy variance-reduced speed-ups despite the existence of nonsmoothness. We provide a unified convergence theory applicable to a wide range of common variance-reduced vector and Jacobian constructions. Our theory (i) only requires operator norm bounds on Jacobians (whereas prior works used potentially much larger Frobenius norms), (ii) provides state-of-the-art high probability guarantees, and (iii) allows inexactness in proximal computations.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively reduce variance and improve optimization efficiency in the optimization of non - convex and non - smooth composite objective functions. Specifically, the paper focuses on the optimization problem of the form $ \Phi(x) = f(g(x))+h(x) $, where: - $ f: \mathbb{R}^m\rightarrow\mathbb{R} $ and $ h: \mathbb{R}^n\rightarrow\mathbb{R} $ are convex functions, - $ g: \mathbb{R}^n\rightarrow\mathbb{R}^m $ is a differentiable mapping, whose interior can be represented in the form of a finite sum or expectation, and is suitable for variance - reduction techniques. Although $ f $ and $ h $ are convex functions, their combination may be neither convex nor smooth. This "convex composite" optimization model is very flexible in many practical applications, such as nonlinear programming and nonlinear equation solving/regression problems. The main contribution of the paper is to provide a unified convergence theory, which is applicable to a wide range of common variance - reduction vectors and Jacobian construction methods. Specifically, the paper addresses the following key issues: 1. **Operator norm assumption**: The paper only depends on the uniform bound of the Jacobian matrix under the operator norm, while previous literature usually uses the larger Frobenius norm. This makes the constant term reduced by a dimension - related factor $ \sqrt{\min\{n, m\}} $. 2. **High - probability guarantee**: The paper provides state - of - the - art high - probability guarantees to ensure that an approximate stationary point is found with a certain probability. 3. **Approximate computation considerations**: The paper allows imprecision to be introduced in approximate computations and provides theoretical guarantees for such cases. Through these improvements, the paper theoretically fills the gaps in previous work and provides better complexity results in some cases. Specific applications include, but are not limited to: - **Nonlinear programming**: Handling minimization problems with functional constraints. - **Nonlinear equation solving/regression**: Solving systems of equations through sampling and first - order queries. Overall, the paper aims to improve the efficiency and accuracy of solving non - convex and non - smooth optimization problems by combining variance - reduction and approximate linearization methods.

Some Unified Theory for Variance Reduced Prox-Linear Methods

On Convergence Rates of Linearized Proximal Algorithms for Convex Composite Optimization with Applications.

Variance reduction techniques for stochastic proximal point algorithms

Variance Reduction and Low Sample Complexity in Stochastic Optimization via Proximal Point Method

Variance-Reduced Proximal Stochastic Gradient Descent for Non-convex Composite optimization.

Linear Convergence of Variance-Reduced Stochastic Gradient without Strong Convexity

Proximal Point Algorithms on Hadamard Manifolds: Linear Convergence and Finite Termination

Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient

Linearized Proximal Algorithms with Adaptive Stepsizes for Convex Composite Optimization with Applications

Variance reduced forward-reflected-backward algorithm for solving nonconvex finite-sum mixed variational inequalities

Convergence analysis of inexact proximal point algorithms on Hadamard manifolds

A Single-Loop Stochastic Proximal Quasi-Newton Method for Large-Scale Nonsmooth Convex Optimization

N ov 2 01 6 A Proximal Stochastic Quasi-Newton Algorithm

A proximal method for composite minimization

A globally convergent proximal Newton-type method in nonsmooth convex optimization

An Inexact Riemannian Proximal Gradient Method

A Unified Analysis on the Subgradient Upper Bounds for the Subgradient Methods Minimizing Composite Nonconvex, Nonsmooth and Non-Lipschitz Functions

A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization.

An inexact regularized proximal Newton method for nonconvex and nonsmooth optimization

Variational Analysis Perspective on Linear Convergence of Some First Order Methods for Nonsmooth Convex Optimization Problems

Smoothed Proximal Lagrangian Method for Nonlinear Constrained Programs