Abstract:Explanations are hypothesized to improve human understanding of machine learning models and achieve a variety of desirable outcomes, ranging from model debugging to enhancing human decision making. However, empirical studies have found mixed and even negative results. An open question, therefore, is under what conditions explanations can improve human understanding and in what way. Using adapted causal diagrams, we provide a formal characterization of the interplay between machine explanations and human understanding, and show how human intuitions play a central role in enabling human understanding. Specifically, we identify three core concepts of interest that cover all existing quantitative measures of understanding in the context of human-AI decision making: task decision boundary, model decision boundary, and model error. Our key result is that without assumptions about task-specific intuitions, explanations may potentially improve human understanding of model decision boundary, but they cannot improve human understanding of task decision boundary or model error. To achieve complementary human-AI performance, we articulate possible ways on how explanations need to work with human intuitions. For instance, human intuitions about the relevance of features (e.g., education is more important than age in predicting a person's income) can be critical in detecting model error. We validate the importance of human intuitions in shaping the outcome of machine explanations with empirical human-subject studies. Overall, our work provides a general framework along with actionable implications for future algorithmic development and empirical experiments of machine explanations.

Can I Trust the Explanations? Investigating Explainable Machine Learning Methods for Monotonic Models

Which Neural Network Makes More Explainable Decisions? an Approach Towards Measuring Explainability

How to address monotonicity for model risk management?

Unified Explanations in Machine Learning Models: A Perturbation Approach

The Intriguing Properties of Model Explanations

"How do I fool you?": Manipulating User Trust via Misleading Black Box Explanations

Dissenting Explanations: Leveraging Disagreement to Reduce Model Overreliance

Quantifying Explainability in Outcome-Oriented Predictive Process Monitoring

Model-Agnostic Interpretability of Machine Learning

Machine Explanations and Human Understanding

Can I Trust the Explainer? Verifying Post-hoc Explanatory Methods

Explaining Explanations in AI

Robust Explanation for Free or at the Cost of Faithfulness.

Model Agnostic Multilevel Explanations

Evaluating Explainability in Machine Learning Predictions through Explainer-Agnostic Metrics

On Generating Monolithic and Model Reconciling Explanations in Probabilistic Scenarios

Minimalistic Explanations: Capturing the Essence of Decisions

SynthTree: Co-supervised Local Model Synthesis for Explainable Prediction

Globally-Consistent Rule-Based Summary-Explanations for Machine Learning Models: Application to Credit-Risk Evaluation

Monotonicity for AI ethics and society: An empirical study of the monotonic neural additive model in criminology, education, health care, and finance

A Survey on the Explainability of Supervised Machine Learning