Abstract:Explainable artificial intelligence and interpretable machine learning are research domains growing in importance. Yet, the underlying concepts remain somewhat elusive and lack generally agreed definitions. While recent inspiration from social sciences has refocused the work on needs and expectations of human recipients, the field still misses a concrete conceptualisation. We take steps towards addressing this challenge by reviewing the philosophical and social foundations of human explainability, which we then translate into the technological realm. In particular, we scrutinise the notion of algorithmic black boxes and the spectrum of understanding determined by explanatory processes and explainees' background knowledge. This approach allows us to define explainability as (logical) reasoning applied to transparent insights (into, possibly black-box, predictive systems) interpreted under background knowledge and placed within a specific context -- a process that engenders understanding in a selected group of explainees. We then employ this conceptualisation to revisit strategies for evaluating explainability as well as the much disputed trade-off between transparency and predictive power, including its implications for ante-hoc and post-hoc techniques along with fairness and accountability established by explainability. We furthermore discuss components of the machine learning workflow that may be in need of interpretability, building on a range of ideas from human-centred explainability, with a particular focus on explainees, contrastive statements and explanatory processes. Our discussion reconciles and complements current research to help better navigate open questions -- rather than attempting to address any individual issue -- thus laying a solid foundation for a grounded discussion and future progress of explainable artificial intelligence and interpretable machine learning.

Disagreement amongst counterfactual explanations: how transparency can be misleading

Disagreement amongst counterfactual explanations: How transparency can be deceptive

Manipulation Risks in Explainable AI: The Implications of the Disagreement Problem

Can Explainable AI Explain Unfairness? A Framework for Evaluating Explainable AI

A Survey of Contrastive and Counterfactual Explanation Generation Methods for Explainable Artificial Intelligence

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective

Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A Review

Critical Empirical Study on Black-box Explanations in AI

Stop ordering machine learning algorithms by their explainability! A user-centered investigation of performance and explainability

Privacy Implications of Explainable AI in Data-Driven Systems

Explainability Is in the Mind of the Beholder: Establishing the Foundations of Explainable Artificial Intelligence

Explanation matters: An experimental study on explainable AI

Features of Explainability: How users understand counterfactual and causal explanations for categorical and continuous features in XAI

Don't be Fooled: The Misinformation Effect of Explanations in Human-AI Collaboration

Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals

Explainable artificial intelligence (XAI) post-hoc explainability methods: risks and limitations in non-discrimination law

A Turing Test for Transparency

Explaining Explanations: An Overview of Interpretability of Machine Learning

Adequate and fair explanations

One Explanation Does Not Fit All: The Promise of Interactive Explanations for Machine Learning Transparency

The privacy issue of counterfactual explanations: explanation linkage attacks