Toward an explainable machine learning model for claim frequency: a use case in car insurance pricing with telematics data

Arthur Maillart
DOI: https://doi.org/10.1007/s13385-021-00270-5
2021-03-19
European Actuarial Journal
Abstract:In this paper, we suggest an explainable machine learning approach to model the claim frequency of a telematics car dataset. In fact, we use a data-driven method based on tree ensembles, namely, the random forest, to create a claim frequency model. Then, we present a method to build a tree that faithfully synthesizes the predictions of a tree ensemble model such as those derived from the random forest or gradient boosting. This tree serves as a global explanation of the predictions of the black-box. Thanks to this surrogate model, we can extract knowledge from a black-box tree ensemble model. Then, we provide an application to improve the performance of a generalized linear model. Indeed, we integrate this new knowledge into a generalized linear model to increase the predictive power.
What problem does this paper attempt to address?