Abstract:This work describes statistical modeling of detailed, microlevel automobile insurance records. We consider 1993–2001 data from a major insurance company in Singapore. By detailed microlevel records, we mean experience at the individual vehicle level, including vehicle and driver characteristics, insurance coverage, and claims experience, by year. The claims experience consists of detailed information on the type of insurance claim, such as whether the claim is due to injury to a third party, property damage to a third party, or claims for damage to the insured, as well as the corresponding claim amount. We propose a hierarchical model for three components, corresponding to the frequency, type, and severity of claims. The first model is a negative binomial regression model for assessing claim frequency. The driver’s gender, age, and no claims discount, as well as vehicle age and type, turn out to be important variables for predicting the event of a claim. The second is a multinomial logit model to predict the type of insurance claim, whether it is third-party injury, third-party property damage, insured’s own damage or some combination. Year, vehicle age, and vehicle type turn out to be important predictors for this component. Our third model is for the severity component. Here we use a generalized beta of the second kind of long-tailed distribution for claim amounts and also incorporate predictor variables. Year, vehicle age, and person’s age turn out to be important predictors for this component. Not surprisingly, we show a significant dependence among the different claim types; we use a t-copula to account for this dependence. The three-component model provides justification for assessing the importance of a rating variable. When taken together, the integrated model allows more efficient prediction of automobile claims compared with than traditional methods. Using simulation, we demonstrate this by developing predictive distributions and calculating premiums under alternative coverage limitations.

Toward an explainable machine learning model for claim frequency: a use case in car insurance pricing with telematics data

Boosting insights in insurance tariff plans with tree-based machine learning methods

Tree-Based Machine Learning Methods For Vehicle Insurance Claims Size Prediction

Machine Learning For An Explainable Cost Prediction of Medical Insurance

Data Mining of Telematics Data: Unveiling the Hidden Patterns in Driving Behaviour

Hierarchical Insurance Claims Modeling

Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff

Modeling claims frequency in the Algerian automobile insurance market using machine learning

Telematics combined actuarial neural networks for cross-sectional and longitudinal claim count data

Improving Explainability of Major Risk Factors in Artificial Neural Networks for Auto Insurance Rate Regulation

Accurate and Intuitive Contextual Explanations using Linear Model Trees

Zero-Inflated Tweedie Boosted Trees with CatBoost for Insurance Loss Analytics

Enhanced Gradient Boosting for Zero-Inflated Insurance Claims and Comparative Analysis of CatBoost, XGBoost, and LightGBM

Autocalibration and Tweedie-dominance for insurance pricing with machine learning

Machine Learning in Ratemaking, an Application in Commercial Auto Insurance

A Fair Pricing Model via Adversarial Learning

Random Forests with Economic Roots: Explaining Machine Learning in Hedonic Imputation

Nightly Automobile Claims Prediction from Telematics-Derived Features: A Multilevel Approach

Stochastic gradient boosting frequency-severity model of insurance claims

From local explanations to global understanding with explainable AI for trees