TRBoost: a generic gradient boosting machine based on trust-region method

Jiaqi Luo,Zihao Wei,Junkai Man,Shixin Xu
DOI: https://doi.org/10.1007/s10489-023-05000-w
IF: 5.3
2023-09-20
Applied Intelligence
Abstract:Gradient Boosting Machines (GBMs) have achieved remarkable success in effectively solving a wide range of problems by leveraging Taylor expansions in functional space. Second-order Taylor-based GBMs, such as XGBoost rooted in Newton's method, consistently yield state-of-the-art results in practical applications. However, it is important to note that the loss functions used in second-order GBMs must strictly adhere to convexity requirements, specifically requiring a positive definite Hessian of the loss. This restriction significantly narrows the range of objectives, thus limiting the application scenarios. In contrast, first-order GBMs are based on the first-order gradient optimization method, enabling them to handle a diverse range of loss functions. Nevertheless, their performance may not always meet expectations. To overcome this limitation, we introduce Trust-region Boosting (TRBoost), a new and versatile Gradient Boosting Machine that combines the strengths of second-order GBMs and the versatility of first-order GBMs. In each iteration, TRBoost employs a constrained quadratic model to approximate the objective and applies the Trust-region algorithm to obtain a new learner. Unlike GBMs based on Newton's method, TRBoost does not require a positive definite Hessian, enabling its application to more loss functions while achieving competitive performance similar to second-order algorithms. Convergence analysis and numerical experiments conducted in this study confirm that TRBoost exhibits similar versatility to first-order GBMs and delivers competitive results compared to second-order GBMs. Overall, TRBoost presents a promising approach that achieves a balance between performance and generality, rendering it a valuable addition to the toolkit of machine learning practitioners.
computer science, artificial intelligence
What problem does this paper attempt to address?