Cost‐sensitive tree SHAP for explaining cost‐sensitive tree‐based models

Marija Kopanja,Stefan Hačko,Sanja Brdar,Miloš Savić
DOI: https://doi.org/10.1111/coin.12651
2024-06-11
Computational Intelligence
Abstract:Cost‐sensitive ensemble learning as a combination of two approaches, ensemble learning and cost‐sensitive learning, enables generation of cost‐sensitive tree‐based ensemble models using the cost‐sensitive decision tree (CSDT) learning algorithm. In general, tree‐based models characterize nice graphical representation that can explain a model's decision‐making process. However, the depth of the tree and the number of base models in the ensemble can be a limiting factor in comprehending the model's decision for each sample. The CSDT models are widely used in finance (e.g., credit scoring and fraud detection) but lack effective explanation methods. We previously addressed this gap with cost‐sensitive tree Shapley Additive Explanation Method (CSTreeSHAP), a cost‐sensitive tree explanation method for the single‐tree CSDT model. Here, we extend the introduced methodology to cost‐sensitive ensemble models, particularly cost‐sensitive random forest models. The paper details the theoretical foundation and implementation details of CSTreeSHAP for both single CSDT and ensemble models. The usefulness of the proposed method is demonstrated by providing explanations for single and ensemble CSDT models trained on well‐known benchmark credit scoring datasets. Finally, we apply our methodology and analyze the stability of explanations for those models compared to the cost‐insensitive tree‐based models. Our analysis reveals statistically significant differences between SHAP values despite seemingly similar global feature importance plots of the models. This highlights the value of our methodology as a comprehensive tool for explaining CSDT models.
computer science, artificial intelligence
What problem does this paper attempt to address?