Abstract:A key challenge for decision makers when incorporating black box machine learned models into practice is being able to understand the predictions provided by these models. One set of methods proposed to address this challenge is that of training surrogate explainer models which approximate how the more complex model is computing its predictions. Explainer methods are generally classified as either local or global explainers depending on what portion of the data space they are purported to explain. The improved coverage of global explainers usually comes at the expense of explainer fidelity (i.e., how well the explainer's predictions match that of the black box model). One way of trading off the advantages of both approaches is to aggregate several local explainers into a single explainer model with improved coverage. However, the problem of aggregating these local explainers is computationally challenging, and existing methods only use heuristics to form these aggregations. In this paper, we propose a local explainer aggregation method which selects local explainers using non-convex optimization. In contrast to other heuristic methods, we use an integer optimization framework to combine local explainers into a near-global aggregate explainer. Our framework allows a decision-maker to directly tradeoff coverage and fidelity of the resulting aggregation through the parameters of the optimization problem. We also propose a novel local explainer algorithm based on information filtering. We evaluate our algorithmic framework on two healthcare datasets: the Parkinson's Progression Marker Initiative (PPMI) data set and a geriatric mobility dataset from the UCI machine learning repository. Our choice of these healthcare-related datasets is motivated by the anticipated need for explainable precision medicine. We find that our method outperforms existing local explainer aggregation methods in terms of both fidelity and coverage of classification. It also improves on fidelity over existing global explainer methods, particularly in multi-class settings, where state-of-the-art methods achieve 70% and ours achieves 90%.

Ranking by Aggregating Referees: Evaluating the Informativeness of Explanation Methods for Time Series Classification

Robust explainer recommendation for time series classification

Which Neural Network Makes More Explainable Decisions? an Approach Towards Measuring Explainability

Improving the Evaluation and Actionability of Explanation Methods for Multivariate Time Series Classification

Evaluating the Explainability of Neural Rankers

Evaluating Local Model-Agnostic Explanations of Learning to Rank Models with Decision Paths

SEGAL time series classification - Stable explanations using a generative model and an adaptive weighting method for LIME

Explaining deep multi-class time series classifiers

Evaluating Recurrent Neural Network Explanations

SSET: Swapping-Sliding Explanation for Time Series Classifiers in Affect Detection

Comparison of feature importance measures as explanations for classification models

Optimal Local Explainer Aggregation for Interpretable Prediction

Learning to Rank Rationales for Explainable Recommendation

EXS: Explainable Search Using Local Model Agnostic Interpretability

On the Relationship between Explanation and Recommendation: Learning to Rank Explanations for Improved Performance

REVEL Framework to Measure Local Linear Explanations for Black-Box Models: Deep Learning Image Classification Case Study

A Song of (Dis)agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing

Sustainable Transparency in Recommender Systems: Bayesian Ranking of Images for Explainability

Visual Explanations with Attributions and Counterfactuals on Time Series Classification

On The Coherence of Quantitative Evaluation of Visual Explanations