Abstract:Explainable AI is an evolving area that deals with understanding the decision making of machine learning models so that these models are more transparent, accountable, and understandable for humans. In particular, post-hoc model-agnostic interpretable AI techniques explain the decisions of a black-box ML model for a single instance locally, without the knowledge of the intrinsic nature of the ML model. Despite their simplicity and capability in providing valuable insights, existing approaches fail to deliver consistent and reliable explanations. Moreover, in the context of black-box classifiers, existing approaches justify the predicted class, but these methods do not ensure that the explanation scores strongly differ as compared to those of another class. In this work we propose a novel post-hoc model agnostic XAI technique that provides contrastive explanations justifying the classification of a black box classifier along with a reasoning as to why another class was not predicted. Our method, which we refer to as CLIMAX which is short for Contrastive Label-aware Influence-based Model Agnostic XAI, is based on local classifiers . In order to ensure model fidelity of the explainer, we require the perturbations to be such that it leads to a class-balanced surrogate dataset. Towards this, we employ a label-aware surrogate data generation method based on random oversampling and Gaussian Mixture Model sampling. Further, we propose influence subsampling in order to retaining effective samples and hence ensure sample complexity. We show that we achieve better consistency as compared to baselines such as LIME, BayLIME, and SLIME. We also depict results on textual and image based datasets, where we generate contrastive explanations for any black-box classification model where one is able to only query the class probabilities for an instance of interest.

Let the CAT out of the bag: Contrastive Attributed explanations for Text

Model Agnostic Contrastive Explanations for Structured Data

Contrastive Corpus Attribution for Explaining Representations

CLIMAX: An exploration of Classifier-Based Contrastive Explanations

KACE: Generating Knowledge-Aware Contrastive Explanations for Natural Language Inference

Explaining Black-box Models for Biomedical Text Classification

Explaining NLP Models via Minimal Contrastive Editing (MiCE)

A Comparative Analysis of Counterfactual Explanation Methods for Text Classifiers

Sim2Word: Explaining Similarity with Representative Attribute Words via Counterfactual Explanations

Explaining with Counter Visual Attributes and Examples

CELL your Model: Contrastive Explanations for Large Language Models

Explaining short text classification with diverse synthetic exemplars and counter-exemplars

Explaining black box text modules in natural language with language models

Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training

Counterfactual-based Saliency Map: Towards Visual Contrastive Explanations for Neural Networks.

Rather a Nurse than a Physician -- Contrastive Explanations under Investigation

Counterfactual Contrastive Learning for Robust Text Classification

Explaining Text Classifiers with Counterfactual Representations

Distinguish Before Answer: Generating Contrastive Explanation As Knowledge for Commonsense Question Answering.

Comparative Study of Language Models on Cross-Domain Data with Model Agnostic Explainability

Towards Explainable Computerized Adaptive Testing with Large Language Model