Abstract:In large-scale unconstrained optimization algorithms such as limited memory BFGS (LBFGS), a common subproblem is a line search minimizing the loss function along a descent direction. Commonly used line searches iteratively find an approximate solution for which the Wolfe conditions are satisfied, typically requiring multiple function and gradient evaluations per line search, which is expensive in parallel due to communication requirements. In this paper we propose a new line search approach for cases where the loss function is analytic, as in least squares regression, logistic regression, or low rank matrix factorization. We approximate the loss function by a truncated Taylor polynomial, whose coefficients may be computed efficiently in parallel with less communication than evaluating the gradient, after which this polynomial may be minimized with high accuracy in a neighbourhood of the expansion point. Our Polynomial Expansion Line Search (PELS) was implemented in the Apache Spark framework and used to accelerate the training of a logistic regression model on binary classification datasets from the LIBSVM repository with LBFGS and the Nonlinear Conjugate Gradient (NCG) method. In large-scale numerical experiments in parallel on a 16-node cluster with 256 cores using the URL, KDDA, and KDDB datasets, the PELS approach produced significant convergence improvements compared to the use of classical Wolfe line searches. For example, to reach the final training label prediction accuracies, LBFGS using PELS had speedup factors of 1.8--2 over LBFGS using a Wolfe line search, measured by both the number of iterations and the time required, due to the better accuracy of step sizes computed in the line search. PELS has the potential to significantly accelerate large-scale regression and factorization computations, and is applicable to continuous optimization problems with smooth loss functions.

A straightforward line search approach on the expected empirical loss for stochastic deep learning problems

Gradient-only line searches: An Alternative to Probabilistic Line Searches

Probabilistic Line Searches for Stochastic Optimization

Gradient Descent for Noisy Optimization

Stagewise Accelerated Stochastic Gradient Methods for Nonconvex Optimization

Efficient Line Search Method Based on Regression and Uncertainty Quantification

Robust Losses for Decision-Focused Learning

Effectively Leveraging Momentum Terms in Stochastic Line Search Frameworks for Fast Optimization of Finite-Sum Problems

A polynomial expansion line search for large-scale unconstrained minimization of smooth L2-regularized loss functions, with implementation in Apache Spark

Cross-Entropy Optimization for Hyperparameter Optimization in Stochastic Gradient-based Approaches to Train Deep Neural Networks

Optimal sampling for stochastic and natural gradient descent

Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation

Empirical Tests of Optimization Assumptions in Deep Learning

Efficient Loss Landscape Reshaping for Convolutional Neural Networks

The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms

GOALS: Gradient-Only Approximations for Line Searches Towards Robust and Consistent Training of Deep Neural Networks

A Data-Driven Line Search Rule for Support Recovery in High-Dimensional Data Analysis.

Empirical Risk Minimization with Shuffled SGD: A Primal-Dual Perspective and Improved Bounds

Random Function Descent

A Note on Task-Aware Loss via Reweighing Prediction Loss by Decision-Regret

Learning Surrogate Losses