Abstract:In this paper, we introduce PASTA (Perceptual Assessment System for explanaTion of Artificial intelligence), a novel framework for a human-centric evaluation of XAI techniques in computer vision. Our first key contribution is a human evaluation of XAI explanations on four diverse datasets (COCO, Pascal Parts, Cats Dogs Cars, and MonumAI) which constitutes the first large-scale benchmark dataset for XAI, with annotations at both the image and concept levels. This dataset allows for robust evaluation and comparison across various XAI methods. Our second major contribution is a data-based metric for assessing the interpretability of explanations. It mimics human preferences, based on a database of human evaluations of explanations in the PASTA-dataset. With its dataset and metric, the PASTA framework provides consistent and reliable comparisons between XAI techniques, in a way that is scalable but still aligned with human evaluations. Additionally, our benchmark allows for comparisons between explanations across different modalities, an aspect previously unaddressed. Our findings indicate that humans tend to prefer saliency maps over other explanation types. Moreover, we provide evidence that human assessments show a low correlation with existing XAI metrics that are numerically simulated by probing the model.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the deficiencies in current evaluation methods for Explainable Artificial Intelligence (XAI) techniques, especially the lack of a standardized evaluation framework from the perspective of human perception. Specifically: 1. **Lack of a Standard Evaluation Framework from the Perspective of Human Perception**: Existing XAI technique evaluations mainly rely on computational models and simulation evaluations, while ignoring human users' understanding and preferences for explanations. This has led to a disconnection between evaluation results and user experiences in practical applications. 2. **High - cost and Unreliable Evaluation**: Currently, widely - used evaluation methods rely on manual annotation. This method is not only costly and time - consuming but also easily affected by individual factors of annotators (such as fatigue, time, etc.), resulting in inconsistent and unreliable evaluation results. To solve these problems, the author proposes a new framework - PASTA (Perceptual Assessment System for explanaTion of Artificial intelligence), aiming to simulate human evaluations of XAI explanations in an automated manner, thereby providing an evaluation standard that is more in line with human cognition. The main contributions of the PASTA framework include: - **Constructing a Large - scale Benchmark Dataset**: It contains image datasets in four different domains (COCO, Pascal Parts, Cats Dogs Cars, and MonumAI). These datasets have image - level and concept - level annotations and can be used to evaluate multiple XAI methods. - **Developing a Data - based Metric Method**: This method provides an automated scoring system by imitating human preferences for explanations. This metric method is based on a large number of human evaluation results in the PASTA dataset, ensuring the consistency and reliability of the evaluation. - **Comparing the Performance of Different XAI Methods**: Through large - scale evaluations of 21 XAI methods, it is found that humans are more inclined to choose saliency maps as an explanation method rather than other types of explanations. - **Revealing the Low Correlation between Existing XAI Metrics and Human Evaluations**: Research shows that there is a low correlation between existing numerical simulation metrics (such as ROAD) and human evaluations, indicating that these metrics may have ignored important aspects of human perception. In conclusion, by introducing the PASTA framework, this paper fills the gap in the perspective of human perception in XAI technique evaluations and provides a more comprehensive, reliable, and human - cognition - compliant evaluation tool.

Benchmarking XAI Explanations with Human-Aligned Evaluations

XAI Benchmark for Visual Explanation

Precise Benchmarking of Explainable AI Attribution Methods

OpenXAI: Towards a Transparent Evaluation of Model Explanations

BEExAI: Benchmark to Evaluate Explainable AI

Assessing Fidelity in XAI post-hoc techniques: A Comparative Study with Ground Truth Explanations Datasets

OpenHEXAI: An Open-Source Framework for Human-Centered Evaluation of Explainable Machine Learning

EXACT: Towards a platform for empirically benchmarking Machine Learning model explanation methods

User-centric evaluation of explainability of AI with and for humans: a comprehensive empirical study

Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI Evaluation Methods into an Interactive and Multi-dimensional Benchmark

Explainable Artificial Intelligence: Evaluating the Objective and Subjective Impacts of xAI on Human-Agent Interaction

XAI Handbook: Towards a Unified Framework for Explainable AI

Navigating the Maze of Explainable AI: A Systematic Approach to Evaluating Methods and Metrics

A survey on XAI and natural language explanations

XC: Exploring Quantitative Use Cases for Explanations in 3D Object Detection

Towards Symbolic XAI -- Explanation Through Human Understandable Logical Relationships Between Features

Exploring Evaluation Methodologies for Explainable AI: Guidelines for Objective and Subjective Assessment

Classification Metrics for Image Explanations: Towards Building Reliable XAI-Evaluations

Interpretability is in the eye of the beholder: Human versus artificial classification of image segments generated by humans versus XAI

XAI-TRIS: Non-linear image benchmarks to quantify false positive post-hoc attribution of feature importance

Human attention guided explainable artificial intelligence for computer vision models