BenchXAI: Comprehensive Benchmarking of Post-hoc Explainable AI Methods on Multi-Modal Biomedical Data

Jacqueline Metsch,Anne-Christin Hauschild
DOI: https://doi.org/10.1101/2024.12.20.629677
2024-12-23
Abstract:The increasing digitalisation of multi-modal data in medicine and novel artificial intelligence (AI) algorithms opens up a large number of opportunities for predictive models. In particular, deep learning models show great performance in the medical field. A major limitation of such powerful but complex models originates from their 'black-box' nature. Recently, a variety of explainable AI (XAI) methods have been introduced to address this lack of transparency and trust in medical AI. However, the majority of such methods have solely been evaluated on single data modalities. Meanwhile, with the increasing number of XAI methods, integrative XAI frameworks and benchmarks are essential to compare their performance on different tasks. For that reason, we developed BenchXAI, a novel XAI benchmarking package supporting comprehensive evaluation of fifteen XAI methods, investigating their robustness, suitability, and limitations in biomedical data. We employed BenchXAI to validate these methods in three common biomedical tasks, namely clinical data, medical image and signal data, and biomolecular data. Our newly designed sample-wise normalisation approach for post-hoc XAI methods enables the statistical evaluation and visualisation of performance and robustness. We found that the XAI methods Integrated Gradients, DeepLift, DeepLiftShap, and GradientShap performed well over all three tasks, while methods like Deconvolution, Guided Backpropagation, and LRP-α1-β0 struggled for some tasks. With acts such as the EU AI Act the application of XAI in the biomedical domain becomes more and more essential. Our evaluation study represents a first step toward verifying the suitability of different XAI methods for various medical domains.
Biology
What problem does this paper attempt to address?