Insurance Loss Modeling with Gradient Tree-Boosted Mixture Models

Yanxi Hou,Jiahong Li,Guangyuan Gao
DOI: https://doi.org/10.2139/ssrn.4062344
2022-01-01
SSRN Electronic Journal
Abstract:In actuarial practice, the mixture model is one widely applied statistical method to model the insurance loss data. Although the Expectation-Maximization (EM) algorithm usually plays an essential tool for the parameter estimation of mixture models, it suffers from other issues which cause unstable predictions. For example, feature engineering and variable selection are two crucial modeling issues that are challenging for mixture models as they involve several component models. Moreover, avoiding overfitting is another technical concern of the modeling method for the prediction of future losses. To address those issues, we propose an Expectation-Boosting (EB) algorithm, which replaces the maximization step in the EM algorithm by gradient boosting machines with regression trees. Our proposed EB algorithm can estimate the mixing probabilities and the component regression functions non-parametrically and overfitting-sensitively and perform automated feature engineering, model fitting, and variable selection simultaneously, which fully explores the predictive power of feature space. Moreover, the proposed algorithm can be combined with parallel computation methods to improve computation efficiency. Finally, we conduct two simulation studies to show the good performance of the proposed algorithm and an empirical analysis of the claims amounts data for illustration.
What problem does this paper attempt to address?