Abstract:Formulating real-world optimization problems often begins with making predictions from historical data (e.g., an optimizer that aims to recommend fast routes relies upon travel-time predictions). Typically, learning the prediction model used to generate the optimization problem and solving that problem are performed in two separate stages. Recent work has showed how such prediction models can be learned end-to-end by differentiating through the optimization task. Such methods often yield empirical improvements, which are typically attributed to end-to-end making better error tradeoffs than the standard loss function used in a two-stage solution. We refine this explanation and more precisely characterize when end-to-end can improve performance. When prediction targets are stochastic, a two-stage solution must make an a priori choice about which statistics of the target distribution to model-we consider expectations over prediction targets-while an end-to-end solution can make this choice adaptively. We show that the performance gap between a two-stage and end-to-end approach is closely related to the price of correlation concept in stochastic optimization and show the implications of some existing POC results for the predict-then-optimize problem. We then consider a novel and particularly practical setting, where multiple prediction targets are combined to obtain each of the objective function's coefficients. We give explicit constructions where (1) two-stage performs unboundedly worse than end-to-end; and (2) two-stage is optimal. We use simulations to experimentally quantify performance gaps and identify a wide range of real-world applications from the literature whose objective functions rely on multiple prediction targets, suggesting that end-to-end learning could yield significant improvements.

HUB: Guiding Learned Optimizers with Continuous Prompt Tuning

Accelerated Optimization in Deep Learning with a Proportional-Integral-derivative Controller

LLM as a Complementary Optimizer to Gradient Descent: A Case Study in Prompt Tuning

Unleashing the Potential of Large Language Models as Prompt Optimizers: An Analogical Analysis with Gradient-based Model Optimizers

Practical tradeoffs between memory, compute, and performance in learned optimizers

Narrowing the Focus: Learned Optimizers for Pretrained Models

Guarantees for Tuning the Step Size Using a Learning-to-Learn Approach

Transformer-Based Learned Optimization

Learning to Optimize for Reinforcement Learning

Large Language Models as Optimizers

Training Learned Optimizers with Randomly Initialized Learned Optimizers

The Perils of Learning Before Optimizing

Efficient Non-Parametric Optimizer Search for Diverse Tasks

Efficient hyperparameters optimization through model-based reinforcement learning with experience exploiting and meta-learning

A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases

A comparative study of recently deep learning optimizers

Optimization-Inspired Learning with Architecture Augmentations and Control Mechanisms for Low-Level Vision

Reinforcement Learning Guided Spearman Dynamic Opposite Gradient-based Optimizer for Numerical Optimization and Anchor Clustering

Reverse engineering learned optimizers reveals known and novel mechanisms

When Gradient Descent Meets Derivative-Free Optimization: A Match Made in Black-Box Scenario

Learning Adaptive Hyper-Guidance Via Proxy-Based Bilevel Optimization for Image Enhancement