Abstract:Predictive analytics play an important role in clinical research. An accurate predictive model can help clinicians stratify risk thereby allowing the identification of a target population which might benefit from a certain intervention. Conventionally, predictive analytics is performed using parametric modeling which comes with a number of assumptions. For example, generalized linear regression models require linearity and additivity to hold for the underlying data. However, these assumptions may not hold in practice. Especially in the era of big data, a large number of covariates or features can be extracted from an electronic database which might have complex interactions and higher-order terms among the covariates. Conventional modeling methods have trouble capturing such high-dimensional relationships. However, some sophisticated machine learning techniques have been invented to handle this situation. Gradient boosting is one of these techniques which is able to recursively fit a weak learner to the residual so as to improve model performance with a gradually increasing number of iterations. It can automatically discover complex data structure, including nonlinearity and high-order interactions, even in the context of hundreds, thousands, or tens-of-thousands of potential predictors. This paper aims to introduce how gradient boosting works. The principles behind this learning machine are explained with a small example in a step-by-step manner. The formal implementation of gradient tree boosting is then illustrated with the caret package. In the simulated example complexity of data structure is created by generating certain interactions between the covariates. This example shows that gradient boosting can better capture these complex relationships than a generalized linear model-based approach.

Insurance Loss Modeling with Gradient Tree-Boosted Mixture Models

Bivariate Gamma Mixture of Experts Models for Joint Insurance Claims Modeling

Insurance Premium Prediction via Gradient Tree-Boosted Tweedie Compound Poisson Models

Combining Structural and Unstructured Data: A Topic-based Finite Mixture Model for Insurance Claim Prediction

Tweedie Gradient Boosting for Extremely Unbalanced Zero-inflated Data

Phase-type mixture-of-experts regression for loss severities

Stochastic gradient boosting frequency-severity model of insurance claims

Enhanced gradient boosting for zero-inflated insurance claims and comparative analysis of CatBoost, XGBoost, and LightGBM

Parameter Estimation of Poisson Mixture with Automated Model Selection Through BYY Harmony Learning

Mixed-Integer Convex Nonlinear Optimization with Gradient-Boosted Trees Embedded

Big Learning Expectation Maximization

A Posteriori Risk Classification and Ratemaking with Random Effects in the Mixture-of-Experts Model

Predictive analytics with gradient boosting in clinical medicine

Boosting insights in insurance tariff plans with tree-based machine learning methods

Model-based clustering and classification using mixtures of multivariate skewed power exponential distributions

An Adaptive Gradient BYY Learning Rule for Poisson Mixture with Automated Model Selection

Combining Predictions of Auto Insurance Claims

Scale-mixture Birnbaum-Saunders quantile regression models applied to personal accident insurance data

Zero-Inflated Tweedie Boosted Trees with CatBoost for Insurance Loss Analytics