MICA: Towards Explainable Skin Lesion Diagnosis via Multi-Level Image-Concept Alignment

Yequan Bie,Luyang Luo,Hao Chen
2024-01-17
Abstract:Black-box deep learning approaches have showcased significant potential in the realm of medical image analysis. However, the stringent trustworthiness requirements intrinsic to the medical field have catalyzed research into the utilization of Explainable Artificial Intelligence (XAI), with a particular focus on concept-based methods. Existing concept-based methods predominantly apply concept annotations from a single perspective (e.g., global level), neglecting the nuanced semantic relationships between sub-regions and concepts embedded within medical images. This leads to underutilization of the valuable medical information and may cause models to fall short in harmoniously balancing interpretability and performance when employing inherently interpretable architectures such as Concept Bottlenecks. To mitigate these shortcomings, we propose a multi-modal explainable disease diagnosis framework that meticulously aligns medical images and clinical-related concepts semantically at multiple strata, encompassing the image level, token level, and concept level. Moreover, our method allows for model intervention and offers both textual and visual explanations in terms of human-interpretable concepts. Experimental results on three skin image datasets demonstrate that our method, while preserving model interpretability, attains high performance and label efficiency for concept detection and disease diagnosis.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### The Problem Addressed by the Paper This paper aims to address the lack of transparency and interpretability in black-box deep learning methods within the field of medical image analysis. Specifically, the authors propose a new framework called MICA (Multi-level Image-Concept Alignment) for skin lesion diagnosis. This framework aligns images with clinical concepts in a multimodal and multi-level manner to enhance model interpretability and performance. **Main issues include:** 1. **Lack of interpretability in black-box models**: Although deep learning performs well in medical image analysis, its "black-box" nature makes the prediction process opaque, failing to meet the high trustworthiness requirements of the medical field. 2. **Limitations of existing interpretability methods**: Existing concept-based methods typically apply concept annotations from a single perspective (e.g., global level), ignoring the complex semantic relationships between image subregions and concepts, making it difficult to balance interpretability and performance. 3. **Shortcomings of concept-based methods**: Existing concept-based methods sacrifice model performance when designing interpretability architectures, especially in terms of classification accuracy, which is inferior to black-box methods. ### Solution The MICA framework addresses the above issues in the following ways: 1. **Multi-level alignment**: Aligns medical images and clinically relevant concepts at three different levels: image-level, token-level, and concept-level, fully utilizing medical information to improve model interpretability and performance. 2. **Utilizing language models**: Uses large language models in the medical field as concept encoders, enabling the model to understand underlying concept semantics. 3. **Simultaneous disease diagnosis and concept detection**: The framework can detect specific medical concepts while diagnosing diseases, providing both textual and visual explanations. ### Experimental Results Experimental results show that MICA achieves excellent performance on three skin image datasets, particularly excelling in concept detection and disease diagnosis while maintaining model interpretability.