How to Boost Any Loss Function

Richard Nock,Yishay Mansour
2024-07-02
Abstract:Boosting is a highly successful ML-born optimization setting in which one is required to computationally efficiently learn arbitrarily good models based on the access to a weak learner oracle, providing classifiers performing at least slightly differently from random guessing. A key difference with gradient-based optimization is that boosting's original model does not requires access to first order information about a loss, yet the decades long history of boosting has quickly evolved it into a first order optimization setting -- sometimes even wrongfully \textit{defining} it as such. Owing to recent progress extending gradient-based optimization to use only a loss' zeroth ($0^{th}$) order information to learn, this begs the question: what loss functions can be efficiently optimized with boosting and what is the information really needed for boosting to meet the \textit{original} boosting blueprint's requirements? We provide a constructive formal answer essentially showing that \textit{any} loss function can be optimized with boosting and thus boosting can achieve a feat not yet known to be possible in the classical $0^{th}$ order setting, since loss functions are not required to be be convex, nor differentiable or Lipschitz -- and in fact not required to be continuous either. Some tools we use are rooted in quantum calculus, the mathematical field -- not to be confounded with quantum computation -- that studies calculus without passing to the limit, and thus without using first order information.
Machine Learning
What problem does this paper attempt to address?
The paper mainly addresses the following issues: 1. **Exploring the applicability of Boosting algorithms within the zeroth-order optimization framework**: Traditional Boosting algorithms rely on first-order information (such as gradients), but the original Boosting model does not require such information. The paper investigates how to optimize any loss function by only accessing the values of the loss function (i.e., zeroth-order information). 2. **Proposing a new Boosting algorithm—SecBoost**: This algorithm can optimize without the need for derivatives of the loss function and is suitable for non-convex, non-differentiable, and even discontinuous loss functions. 3. **Theoretical analysis and guarantees**: The paper provides theoretical proofs showing that the proposed SecBoost algorithm can effectively optimize loss functions with broad characteristics and offers an analysis of the convergence rate. 4. **Handling non-standard loss functions**: The paper explores how to handle non-traditional loss functions in Boosting, which may not possess the usually assumed properties (such as convexity, differentiability, etc.). In summary, the main contribution of this paper is to extend the application range of Boosting algorithms, enabling them to work under a wider variety of loss functions. It introduces a new algorithm, SecBoost, along with corresponding theoretical analysis, which helps advance optimization techniques in machine learning.