Modified Gauss-Newton Algorithms under Noise

Krishna Pillutla,Vincent Roulet,Sham Kakade,Zaid Harchaoui
2023-05-18
Abstract:Gauss-Newton methods and their stochastic version have been widely used in machine learning and signal processing. Their nonsmooth counterparts, modified Gauss-Newton or prox-linear algorithms, can lead to contrasting outcomes when compared to gradient descent in large-scale statistical settings. We explore the contrasting performance of these two classes of algorithms in theory on a stylized statistical example, and experimentally on learning problems including structured prediction. In theory, we delineate the regime where the quadratic convergence of the modified Gauss-Newton method is active under statistical noise. In the experiments, we underline the versatility of stochastic (sub)-gradient descent to minimize nonsmooth composite objectives.
Optimization and Control,Machine Learning
What problem does this paper attempt to address?
The paper primarily explores the performance comparison between modified Gauss-Newton methods or prox-linear algorithms and direct stochastic subgradient descent in the presence of statistical noise. Specifically: 1. **Research Background**: - Gauss-Newton methods and their variants (such as the Levenberg-Marquardt method) are widely used in machine learning and signal processing. - In large-scale statistical settings, modified Gauss-Newton methods or prox-linear algorithms exhibit different results compared to gradient descent. 2. **Theoretical Analysis**: - Theoretically, the paper elaborates on when modified Gauss-Newton methods can achieve quadratic convergence in the presence of statistical noise. - The paper quantifies through a typical statistical example whether the precise prox-linear method's quadratic convergence is effective before reaching the noise level. 3. **Experimental Validation**: - The experimental section demonstrates the flexibility of stochastic (sub)gradient descent in minimizing non-smooth composite objective functions. - Through the study of structured prediction problems, experimental results indicate that in some cases, modified Gauss-Newton methods or prox-linear algorithms provide marginal gains, but direct stochastic subgradient descent shows greater flexibility in handling complex learning problems. 4. **Main Findings**: - When the noise level is high, the local quadratic convergence advantage of modified Gauss-Newton methods may no longer be apparent. - For multi-output regression tasks, stochastic subgradient descent generally outperforms prox-linear algorithms under high signal-to-noise ratio conditions. - In the context of path planning as a structured prediction problem, even simple constant step-size stochastic gradient descent can achieve good test error performance.