Shapley Values for Explaining the Black Box Nature of Machine Learning Model Clustering

Mouad Louhichi,Redwane Nesmaoui,Marwan Mbarek,Mohamed Lazaar
DOI: https://doi.org/10.1016/j.procs.2023.03.107
2023-04-20
Procedia Computer Science
Abstract:Machine learning (ML) models are becoming increasingly complex. In fact, a sophisticated model (XGBoost boosting or deep learning) generally leads to more accurate predictions than a simple model (linear regression or decision tree). There is therefore a trade-off between the performance of a model and its interpretability: what a model gains in performance, it loses in interpretability (and vice versa), where interpretability is the ability for a human to understand the reasons for a model's decision. However, explaining the predictions made by machine learning models aims at computing and interpreting the importance of features. To achieve this, game theory has recently gained attention for better understanding the similarity between group members. In this paper, we use SHAP (SHapley Additive exPlanations), which is a method based on cooperative game theory, to analyze and evaluate the properties of each group. More importantly, we rely k-means PCA and Light gbm classifier to improve the data preparation before grouping the features into multiple clusters. The simulation results prove the importance of shapley value in creating an accurate and meaningful representation of each cluster.
What problem does this paper attempt to address?