Robust boosting for regression problems

Xiaomeng Ju,Matías Salibián-Barrera
DOI: https://doi.org/10.1016/j.csda.2020.107065
2021-01-01
Abstract:<p>Gradient boosting algorithms construct a regression predictor using a linear combination of "base learners". Boosting also offers an approach to obtaining robust non-parametric regression estimators that are scalable to applications with many explanatory variables. The robust boosting algorithm is based on a two-stage approach, similar to what is done for robust linear regression: it first minimizes a robust residual scale estimator, and then improves it by optimizing a bounded loss function. Unlike previous robust boosting proposals this approach does not require computing an ad-hoc residual scale estimator in each boosting iteration. Since the loss functions involved in this robust boosting algorithm are typically non-convex, a reliable initialization step is required, such as an <span class="math"><math>L1</math></span> regression tree, which is also fast to compute. A robust variable importance measure can also be calculated via a permutation procedure. Thorough simulation studies and several data analyses show that, when no atypical observations are present, the robust boosting approach works as well as the standard gradient boosting with a squared loss. Furthermore, when the data contain outliers, the robust boosting estimator outperforms the alternatives in terms of prediction error and variable selection accuracy.</p>
statistics & probability,computer science, interdisciplinary applications
What problem does this paper attempt to address?