How to Boost Any Loss Function

Richard Nock,Yishay Mansour

2024-07-02

Abstract:Boosting is a highly successful ML-born optimization setting in which one is required to computationally efficiently learn arbitrarily good models based on the access to a weak learner oracle, providing classifiers performing at least slightly differently from random guessing. A key difference with gradient-based optimization is that boosting's original model does not requires access to first order information about a loss, yet the decades long history of boosting has quickly evolved it into a first order optimization setting -- sometimes even wrongfully \textit{defining} it as such. Owing to recent progress extending gradient-based optimization to use only a loss' zeroth ($0^{th}$) order information to learn, this begs the question: what loss functions can be efficiently optimized with boosting and what is the information really needed for boosting to meet the \textit{original} boosting blueprint's requirements? We provide a constructive formal answer essentially showing that \textit{any} loss function can be optimized with boosting and thus boosting can achieve a feat not yet known to be possible in the classical $0^{th}$ order setting, since loss functions are not required to be be convex, nor differentiable or Lipschitz -- and in fact not required to be continuous either. Some tools we use are rooted in quantum calculus, the mathematical field -- not to be confounded with quantum computation -- that studies calculus without passing to the limit, and thus without using first order information.

Machine Learning

What problem does this paper attempt to address?

The paper mainly addresses the following issues: 1. **Exploring the applicability of Boosting algorithms within the zeroth-order optimization framework**: Traditional Boosting algorithms rely on first-order information (such as gradients), but the original Boosting model does not require such information. The paper investigates how to optimize any loss function by only accessing the values of the loss function (i.e., zeroth-order information). 2. **Proposing a new Boosting algorithm—SecBoost**: This algorithm can optimize without the need for derivatives of the loss function and is suitable for non-convex, non-differentiable, and even discontinuous loss functions. 3. **Theoretical analysis and guarantees**: The paper provides theoretical proofs showing that the proposed SecBoost algorithm can effectively optimize loss functions with broad characteristics and offers an analysis of the convergence rate. 4. **Handling non-standard loss functions**: The paper explores how to handle non-traditional loss functions in Boosting, which may not possess the usually assumed properties (such as convexity, differentiability, etc.). In summary, the main contribution of this paper is to extend the application range of Boosting algorithms, enabling them to work under a wider variety of loss functions. It introduces a new algorithm, SecBoost, along with corresponding theoretical analysis, which helps advance optimization techniques in machine learning.

How to Boost Any Loss Function

Boosting in the presence of outliers: adaptive classification with non-convex loss functions

Optimization by gradient boosting

Of Dice and Games: A Theory of Generalized Boosting

Learning Surrogate Losses

Improved scalability under heavy tails, without strong convexity

Loss Landscape Characterization of Neural Networks without Over-Parametrization

Alternate Loss Functions for Classification and Robust Regression Can Improve the Accuracy of Artificial Neural Networks

On the Convergence Properties of Optimal AdaBoost

Functional Frank-Wolfe Boosting for General Loss Functions

A new boosting algorithm based on dual averaging scheme

Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms

Tuning gradient boosting for imbalanced bioassay modelling with custom loss functions

A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-$\ell_1$-Norm Interpolated Classifiers

QBoost: Predicting quantiles with boosting for regression and binary classification

Accelerated Gradient Algorithms with Adaptive Subspace Search for Instance-Faster Optimization

The Multiscale Structure of Neural Network Loss Functions: The Effect on Optimization and Origin

Comparisons Are All You Need for Optimizing Smooth Functions

A Kernel Loss for Solving the Bellman Equation

Robust Losses for Decision-Focused Learning