Abstract:A vast majority of machine learning algorithms train their models and perform inference by solving optimization problems. In order to capture the learning and prediction problems accurately, structural constraints such as sparsity or low rank are frequently imposed or else the objective itself is designed to be a non-convex function. This is especially true of algorithms that operate in high-dimensional spaces or that train non-linear models such as tensor models and deep networks. The freedom to express the learning problem as a non-convex optimization problem gives immense modeling power to the algorithm designer, but often such problems are NP-hard to solve. A popular workaround to this has been to relax non-convex problems to convex ones and use traditional methods to solve the (convex) relaxed optimization problems. However this approach may be lossy and nevertheless presents significant challenges for large scale optimization. On the other hand, direct approaches to non-convex optimization have met with resounding success in several domains and remain the methods of choice for the practitioner, as they frequently outperform relaxation-based techniques – popular heuristics include projected gradient descent and alternating minimization. However, these are often poorly understood in terms of their convergence and other properties. This monograph presents a selection of recent advances that bridge a long-standing gap in our understanding of these heuristics. We hope that an insight into the inner workings of these methods will allow the reader to appreciate the unique marriage of task structure and generative models that allow these heuristic techniques to (provably) succeed. The monograph will lead the reader through several widely used nonconvex optimization techniques, as well as applications thereof. The goal of this monograph is to both, introduce the rich literature in this area, as well as equip the reader with the tools and techniques needed to analyze these simple procedures for non-convex problems.

The Challenges of Optimization For Data Science

Learning Hard Optimization Problems: A Data Generation Perspective

Non-convex Optimization for Machine Learning

Gradient Descent in the Absence of Global Lipschitz Continuity of the Gradients

Optimizing $(L_0, L_1)$-Smooth Functions by Gradient Methods

Understanding Optimization of Deep Learning via Jacobian Matrix and Lipschitz Constant

Introduction to Nonsmooth Analysis and Optimization

Adaptive Algorithms for Relatively Lipschitz Continuous Convex Optimization Problems

The role of optimization in some recent advances in data-driven decision-making

Super Gradient Descent: Global Optimization requires Global Gradient

A gentle introduction to gradient-based optimization and variational inequalities for machine learning

Convex and Non-convex Optimization Under Generalized Smoothness

Empirical Tests of Optimization Assumptions in Deep Learning

A collection of challenging optimization problems in science, engineering and economics

Why Is Optimization Difficult?

Review Non-convex Optimization Method for Machine Learning

A geometric integration approach to smooth optimisation: Foundations of the discrete gradient method

Enhancing Deep Learning with Optimized Gradient Descent: Bridging Numerical Methods and Neural Network Training

Accelerated optimization algorithms and ordinary differential equations: the convex non Euclidean case

Derivative-Free Global Minimization in One Dimension: Relaxation, Monte Carlo, and Sampling

Recent Advances in Non-convex Smoothness Conditions and Applicability to Deep Linear Neural Networks