Abstract:Neural networks are getting increasingly popular thanks to their exceptional performance in solving many real-world problems. At the same time, they are shown to be vulnerable to attacks, difficult to debug and subject to fairness issues. To improve people’s trust in the technology, it is often necessary to provide some human-understandable explanation of neural networks’ decisions, e.g., why is that my loan application is rejected whereas hers is approved? That is, the stakeholder would be interested to minimize the chances of not being able to explain the decision consistently and would like to know how often and how easy it is to explain the decisions of a neural network before it is deployed. In this work, we provide two measurements on the decision explainability of neural networks. Afterwards, we develop algorithms for evaluating the measurements of user-provided neural networks automatically. We evaluate our approach on multiple neural network models trained on benchmark datasets. The results show that existing neural networks’ decisions often have low explainability according to our measurements. This is in line with the observation that adversarial samples can be easily generated through adversarial perturbation, which are often hard to explain. Our further experiments show that the decisions of the models trained with robust training are not necessarily easier to explain, whereas decisions of the models retrained with samples generated by our algorithms are easier to explain.

Can I Trust the Explainer? Verifying Post-hoc Explanatory Methods

Which Neural Network Makes More Explainable Decisions? an Approach Towards Measuring Explainability

How Well Do Feature-Additive Explainers Explain Feature-Additive Predictors?

Can you trust your explanations? A robustness test for feature attribution methods

The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal Sufficient Subsets

Helpful, Misleading or Confusing: How Humans Perceive Fundamental Building Blocks of Artificial Intelligence Explanations

Altruist: Argumentative Explanations through Local Interpretations of Predictive Models

Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations

Solving the enigma: Deriving optimal explanations of deep networks

Are Objective Explanatory Evaluation metrics Trustworthy? An Adversarial Analysis

Trustworthy Conceptual Explanations for Neural Networks in Robot Decision-Making

The future of human-centric eXplainable Artificial Intelligence (XAI) is not post-hoc explanations

Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications

Verifying Relational Explanations: A Probabilistic Approach

Towards a Unified Framework for Evaluating Explanations

Explaining Explanations: An Overview of Interpretability of Machine Learning

Benchmarking and survey of explanation methods for black box models

Analysis of Explainers of Black Box Deep Neural Networks for Computer Vision: A Survey

On Evaluating Explanation Utility for Human-AI Decision Making in NLP

Deceptive AI Explanations: Creation and Detection

Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations