Some Unified Theory for Variance Reduced Prox-Linear Methods

Yue Wu,Benjamin Grimmer
2024-12-20
Abstract:This work considers the nonconvex, nonsmooth problem of minimizing a composite objective of the form $f(g(x))+h(x)$ where the inner mapping $g$ is a smooth finite summation or expectation amenable to variance reduction. In such settings, prox-linear methods can enjoy variance-reduced speed-ups despite the existence of nonsmoothness. We provide a unified convergence theory applicable to a wide range of common variance-reduced vector and Jacobian constructions. Our theory (i) only requires operator norm bounds on Jacobians (whereas prior works used potentially much larger Frobenius norms), (ii) provides state-of-the-art high probability guarantees, and (iii) allows inexactness in proximal computations.
Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively reduce variance and improve optimization efficiency in the optimization of non - convex and non - smooth composite objective functions. Specifically, the paper focuses on the optimization problem of the form \( \Phi(x) = f(g(x))+h(x) \), where: - \( f: \mathbb{R}^m\rightarrow\mathbb{R} \) and \( h: \mathbb{R}^n\rightarrow\mathbb{R} \) are convex functions, - \( g: \mathbb{R}^n\rightarrow\mathbb{R}^m \) is a differentiable mapping, whose interior can be represented in the form of a finite sum or expectation, and is suitable for variance - reduction techniques. Although \( f \) and \( h \) are convex functions, their combination may be neither convex nor smooth. This "convex composite" optimization model is very flexible in many practical applications, such as nonlinear programming and nonlinear equation solving/regression problems. The main contribution of the paper is to provide a unified convergence theory, which is applicable to a wide range of common variance - reduction vectors and Jacobian construction methods. Specifically, the paper addresses the following key issues: 1. **Operator norm assumption**: The paper only depends on the uniform bound of the Jacobian matrix under the operator norm, while previous literature usually uses the larger Frobenius norm. This makes the constant term reduced by a dimension - related factor \( \sqrt{\min\{n, m\}} \). 2. **High - probability guarantee**: The paper provides state - of - the - art high - probability guarantees to ensure that an approximate stationary point is found with a certain probability. 3. **Approximate computation considerations**: The paper allows imprecision to be introduced in approximate computations and provides theoretical guarantees for such cases. Through these improvements, the paper theoretically fills the gaps in previous work and provides better complexity results in some cases. Specific applications include, but are not limited to: - **Nonlinear programming**: Handling minimization problems with functional constraints. - **Nonlinear equation solving/regression**: Solving systems of equations through sampling and first - order queries. Overall, the paper aims to improve the efficiency and accuracy of solving non - convex and non - smooth optimization problems by combining variance - reduction and approximate linearization methods.