Modified Gauss-Newton Algorithms under Noise

Krishna Pillutla,Vincent Roulet,Sham Kakade,Zaid Harchaoui

2023-05-18

Abstract:Gauss-Newton methods and their stochastic version have been widely used in machine learning and signal processing. Their nonsmooth counterparts, modified Gauss-Newton or prox-linear algorithms, can lead to contrasting outcomes when compared to gradient descent in large-scale statistical settings. We explore the contrasting performance of these two classes of algorithms in theory on a stylized statistical example, and experimentally on learning problems including structured prediction. In theory, we delineate the regime where the quadratic convergence of the modified Gauss-Newton method is active under statistical noise. In the experiments, we underline the versatility of stochastic (sub)-gradient descent to minimize nonsmooth composite objectives.

Optimization and Control,Machine Learning

What problem does this paper attempt to address?

The paper primarily explores the performance comparison between modified Gauss-Newton methods or prox-linear algorithms and direct stochastic subgradient descent in the presence of statistical noise. Specifically: 1. **Research Background**: - Gauss-Newton methods and their variants (such as the Levenberg-Marquardt method) are widely used in machine learning and signal processing. - In large-scale statistical settings, modified Gauss-Newton methods or prox-linear algorithms exhibit different results compared to gradient descent. 2. **Theoretical Analysis**: - Theoretically, the paper elaborates on when modified Gauss-Newton methods can achieve quadratic convergence in the presence of statistical noise. - The paper quantifies through a typical statistical example whether the precise prox-linear method's quadratic convergence is effective before reaching the noise level. 3. **Experimental Validation**: - The experimental section demonstrates the flexibility of stochastic (sub)gradient descent in minimizing non-smooth composite objective functions. - Through the study of structured prediction problems, experimental results indicate that in some cases, modified Gauss-Newton methods or prox-linear algorithms provide marginal gains, but direct stochastic subgradient descent shows greater flexibility in handling complex learning problems. 4. **Main Findings**: - When the noise level is high, the local quadratic convergence advantage of modified Gauss-Newton methods may no longer be apparent. - For multi-output regression tasks, stochastic subgradient descent generally outperforms prox-linear algorithms under high signal-to-noise ratio conditions. - In the context of path planning as a structured prediction problem, even simple constant step-size stochastic gradient descent can achieve good test error performance.

Modified Gauss-Newton Algorithms under Noise

Convergence of Projected Subgradient Method with Sparse or Low-Rank Constraints

Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees

Global Convergence of Noisy Gradient Descent.

Stochastic Optimization with Non-stationary Noise

High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise

Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Stochastic Sub-Sampled Newton Method with Variance Reduction

The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects

Incremental Gauss-Newton Descent for Machine Learning

A Stochastic Semismooth Newton Method for Nonsmooth Nonconvex Optimization.

Local SGD for Near-Quadratic Problems: Improving Convergence under Unconstrained Noise Conditions

An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes

Multiplicative noise and heavy tails in stochastic optimization

Stochastic Optimization with Non-stationary Noise: the Power of Moment Estimation

A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization

High Probability Convergence Bounds for Non-convex Stochastic Gradient Descent with Sub-Weibull Noise

Accelerated stochastic approximation with state-dependent noise

Algorithms with Gradient Clipping for Stochastic Optimization with Heavy-Tailed Noise

A Proximal Stochastic Quasi-Newton Algorithm with Dynamical Sampling and Stochastic Line Search