On the evaluation of deep learning interpretability methods for medical images under the scope of faithfulness

Vangelis Lamprou,Athanasios Kallipolitis,Ilias Maglogiannis
DOI: https://doi.org/10.1016/j.cmpb.2024.108238
IF: 6.1
2024-05-30
Computer Methods and Programs in Biomedicine
Abstract:Background and Objective: Evaluating the interpretability of Deep Learning models is crucial for building trust and gaining insights into their decision-making processes. In this work, we employ class activation map based attribution methods in a setting where only High-Resolution Class Activation Mapping (HiResCAM) is known to produce faithful explanations. The objective is to evaluate the quality of the attribution maps using quantitative metrics and investigate whether faithfulness aligns with the metrics results. Methods: We fine-tune pre-trained deep learning architectures over four medical image datasets in order to calculate attribution maps. The maps are evaluated on a threefold metrics basis utilizing well-established evaluation scores. Results: Our experimental findings suggest that the Area Over Perturbation Curve (AOPC) and Max-Sensitivity scores favour the HiResCAM maps. On the other hand, the Heatmap Assisted Accuracy Score (HAAS) does not provide insights to our comparison as it evaluates almost all maps as inaccurate. To this purpose we further compare our calculated values against values obtained over a diverse group of models which are trained on non-medical benchmark datasets, to eventually achieve more responsive results. Conclusion: This study develops a series of experiments to discuss the connection between faithfulness and quantitative metrics over medical attribution maps. HiResCAM preserves the gradient effect on a pixel level ultimately producing high-resolution, informative and resilient mappings. In turn, this is depicted in the results of AOPC and Max-Sensitivity metrics, successfully identifying the faithful algorithm. In regards to HAAS, our experiments yield that it is sensitive over complex medical patterns, commonly characterized by strong colour dependency and multiple attention areas.
engineering, biomedical,computer science, interdisciplinary applications,medical informatics, theory & methods
What problem does this paper attempt to address?