Abstract:Shapley values originated in cooperative game theory but are extensively used today as a model-agnostic explanation framework to explain predictions made by complex machine learning models in the industry and academia. There are several algorithmic approaches for computing different versions of Shapley value explanations. Here, we consider Shapley values incorporating feature dependencies, referred to as conditional Shapley values, for predictive models fitted to tabular data. Estimating precise conditional Shapley values is difficult as they require the estimation of non-trivial conditional expectations. In this article, we develop new methods, extend earlier proposed approaches, and systematize the new refined and existing methods into different method classes for comparison and evaluation. The method classes use either Monte Carlo integration or regression to model the conditional expectations. We conduct extensive simulation studies to evaluate how precisely the different method classes estimate the conditional expectations, and thereby the conditional Shapley values, for different setups. We also apply the methods to several real-world data experiments and provide recommendations for when to use the different method classes and approaches. Roughly speaking, we recommend using parametric methods when we can specify the data distribution almost correctly, as they generally produce the most accurate Shapley value explanations. When the distribution is unknown, both generative methods and regression models with a similar form as the underlying predictive model are good and stable options. Regression-based methods are often slow to train but quickly produce the Shapley value explanations once trained. The vice versa is true for Monte Carlo-based methods, making the different methods appropriate in different practical situations.

K-Fold Cross-Valuation for Machine Learning Using Shapley Value

Evaluate the Contribution of Multiple Participants in Federated Learning

CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification

Absolute Shapley Value

Data valuation: The partial ordinal Shapley value for machine learning

Equitable Valuation of Crowdsensing for Machine Learning via Game Theory

Dynamic Shapley Value Computation.

Towards Data Valuation via Asymmetric Data Shapley

Data Valuation by Leveraging Global and Local Statistical Information

EcoVal: An Efficient Data Valuation Framework for Machine Learning

Variance reduced shapley value estimation for trustworthy data valuation

CHG Shapley: Efficient Data Valuation and Selection towards Trustworthy Machine Learning

DU-Shapley: A Shapley Value Proxy for Efficient Dataset Valuation

2D-Shapley: A Framework for Fragmented Data Valuation

On the Inflation of KNN-Shapley Value

Optimizing Federated Learning on Non-IID Data Using Local Shapley Value.

A Principled Approach to Data Valuation for Federated Learning

A comparative study of methods for estimating model-agnostic Shapley value explanations

Data Valuation for Vertical Federated Learning: A Model-free and Privacy-preserving Method

A Comparative Study of Methods for Estimating Conditional Shapley Values and When to Use Them