Abstract:The rise of deep learning has ushered in significant progress in computer vision (CV) tasks, yet the "black box" nature of these models often precludes interpretability. This challenge has spurred the development of Explainable Artificial Intelligence (XAI) by generating explanations to AI's decision-making process. An explanation is aimed to not only faithfully reflect the true reasoning process (i.e., faithfulness) but also align with humans' reasoning (i.e., alignment). Within XAI, visual explanations employ visual cues to elucidate the reasoning behind machine learning models, particularly in image processing, by highlighting images' critical areas important to predictions. Despite the considerable body of research in visual explanations, standardized benchmarks for evaluating them are seriously underdeveloped. In particular, to evaluate alignment, existing works usually merely illustrate a few images' visual explanations, or hire some referees to report the explanation quality under ad-hoc questionnaires. However, this cannot achieve a standardized, quantitative, and comprehensive evaluation. To address this issue, we develop a benchmark for visual explanation, consisting of eight datasets with human explanation annotations from various domains, accommodating both post-hoc and intrinsic visual explanation methods. Additionally, we devise a visual explanation pipeline that includes data loading, explanation generation, and method evaluation. Our proposed benchmarks facilitate a fair evaluation and comparison of visual explanation methods. Building on our curated collection of datasets, we benchmarked eight existing visual explanation methods and conducted a thorough comparison across four selected datasets using six alignment-based and causality-based metrics. Our benchmark will be accessible through our website <a class="link-external link-https" href="https://xaidataset.github.io" rel="external noopener nofollow">this https URL</a>.

BenchXAI: Comprehensive Benchmarking of Post-hoc Explainable AI Methods on Multi-Modal Biomedical Data

Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI Evaluation Methods into an Interactive and Multi-dimensional Benchmark

Unbox the black-box for the medical explainable AI via multi-modal and multi-centre data fusion: A mini-review, two showcases and beyond

Explainable artificial intelligence (XAI) in deep learning-based medical image analysis

BEExAI: Benchmark to Evaluate Explainable AI

A Comparative Approach to Explainable Artificial Intelligence Methods in Application to High-Dimensional Electronic Health Records: Examining the Usability of XAI

A Comparative Study and Systematic Analysis of XAI Models and their Applications in Healthcare

Explainable AI applications in the Medical Domain: a systematic review

A Survey on Medical Explainable AI (XAI): Recent Progress, Explainability Approach, Human Interaction and Scoring System

A Survey on Explainable Artificial Intelligence (XAI): Towards Medical XAI

Application of Example-Based Explainable Artificial Intelligence (XAI) for Analysis and Interpretation of Medical Imaging: A Systematic Review

Evaluation of Popular XAI Applied to Clinical Prediction Models: Can They be Trusted?

XAI Benchmark for Visual Explanation

Explainable AI for Medical Data: Current Methods, Limitations, and Future Directions

Precise Benchmarking of Explainable AI Attribution Methods

Explainable artificial intelligence (XAI) in radiology and nuclear medicine: a literature review

Unveiling the black box: A systematic review of Explainable Artificial Intelligence in medical image analysis

XAI-TRIS: Non-linear image benchmarks to quantify false positive post-hoc attribution of feature importance

OpenHEXAI: An Open-Source Framework for Human-Centered Evaluation of Explainable Machine Learning

A review of evaluation approaches for explainable AI with applications in cardiology

A Brief Review of Explainable Artificial Intelligence in Healthcare