Abstract:Deep convolutional neural networks have proven their effectiveness, and have been acknowledged as the most dominant method for image classification. However, a severe drawback of deep convolutional neural networks is poor explainability. Unfortunately, in many real-world applications, users need to understand the rationale behind the predictions of deep convolutional neural networks when determining whether they should trust the predictions or not. To resolve this issue, a novel genetic algorithm-based method is proposed for the first time to automatically evolve local explanations that can assist users to assess the rationality of the predictions. Furthermore, the proposed method is model-agnostic, i.e., it can be utilised to explain any deep convolutional neural network models. In the experiments, ResNet is used as an example model to be explained, and the ImageNet dataset is selected as the benchmark dataset. DenseNet and MobileNet are further explained to demonstrate the model-agnostic characteristic of the proposed method. The evolved local explanations on four images, randomly selected from ImageNet, are presented, which show that the evolved local explanations are straightforward to be recognised by humans. Moreover, the evolved explanations can explain the predictions of deep convolutional neural networks on all four images very well by successfully capturing meaningful interpretable features of the sample images. Further analysis based on the 30 runs of the experiments exhibits that the evolved local explanations can also improve the probabilities/confidences of the deep convolutional neural network models in making the predictions. The proposed method can obtain local explanations within one minute, which is more than ten times faster than LIME (the state-of-the-art method).

Model-Agnostic Local Explanations with Genetic Algorithms for Text Classification

Local Rule-Based Explanations of Black Box Decision Systems

XPROAX-Local explanations for text classification with progressive neighborhood approximation

Transparent Neighborhood Approximation for Text Classifier Explanation

LaPLACE: Probabilistic Local Model-Agnostic Causal Explanations

Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training

Local Explanation of Dialogue Response Generation

Explaining Deep Convolutional Neural Networks for Image Classification by Evolving Local Interpretable Model-agnostic Explanations

Contextual Local Explanation for Black Box Classifiers

A General Search-based Framework for Generating Textual Counterfactual Explanations

Explaining Black-box Models for Biomedical Text Classification

Towards LLM-guided Causal Explainability for Black-box Text Classifiers

Learning Model Agnostic Explanations via Constraint Programming

Model Agnostic Local Explanations of Reject

Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals

Model Agnostic Multilevel Explanations

Generating Counterfactual Explanations with Natural Language

Locally Invariant Explanations: Towards Stable and Unidirectional Explanations through Local Invariant Learning

Evaluating the Correctness of Explainable AI Algorithms for Classification

Explain the Explainer: Interpreting Model-Agnostic Counterfactual Explanations of a Deep Reinforcement Learning Agent

Explaining short text classification with diverse synthetic exemplars and counter-exemplars