Transformer-based interpretable multi-modal data fusion for skin lesion classification

Theodor Cheslerean-Boghiu,Melia-Evelina Fleischmann,Theresa Willem,Tobias Lasser

2023-08-31

Abstract:A lot of deep learning (DL) research these days is mainly focused on improving quantitative metrics regardless of other factors. In human-centered applications, like skin lesion classification in dermatology, DL-driven clinical decision support systems are still in their infancy due to the limited transparency of their decision-making process. Moreover, the lack of procedures that can explain the behavior of trained DL algorithms leads to almost no trust from clinical physicians. To diagnose skin lesions, dermatologists rely on visual assessment of the disease and the data gathered from the patient's anamnesis. Data-driven algorithms dealing with multi-modal data are limited by the separation of feature-level and decision-level fusion procedures required by convolutional architectures. To address this issue, we enable single-stage multi-modal data fusion via the attention mechanism of transformer-based architectures to aid in diagnosing skin diseases. Our method beats other state-of-the-art single- and multi-modal DL architectures in image-rich and patient-data-rich environments. Additionally, the choice of the architecture enables native interpretability support for the classification task both in the image and metadata domain with no additional modifications necessary.

Image and Video Processing,Computer Vision and Pattern Recognition,Machine Learning

What problem does this paper attempt to address?

The paper primarily addresses the issue of skin lesion classification by proposing a multimodal data fusion method based on Transformer, aiming to improve diagnostic accuracy and enhance model interpretability. Specifically, the paper attempts to solve the following key problems: 1. **Improving the accuracy of skin lesion classification**: By combining image information (such as the visual features of skin lesions) with patient history and other metadata, the Transformer architecture is used to handle multimodal data fusion, with the goal of achieving better classification results than using images or metadata alone. 2. **Enhancing model transparency and interpretability**: In the medical field, particularly when dermatologists diagnose skin lesions, it is essential to understand how machine learning models make decisions. Therefore, this study focuses on making the model's decision-making process more transparent so that clinicians can trust and adopt these deep learning-based decision support systems. 3. **Addressing the limitations of existing deep learning models**: Many current deep learning models (especially those based on convolutional neural networks) have limitations when processing multimodal data, such as the separation of feature-level fusion and decision-level fusion, which restricts model performance. The proposed method in this paper aims to overcome these limitations by achieving single-stage multimodal data fusion. 4. **Evaluating the impact of different metadata combinations**: The paper also explores the impact of different quantities and types of metadata on model performance and demonstrates that proper metadata engineering can further improve classification performance. In summary, the goal of this paper is to improve diagnostic accuracy in the task of skin lesion classification through multimodal data fusion and to increase clinicians' trust in such systems by enhancing model interpretability. Additionally, the study focuses on optimizing model performance by selecting appropriate metadata.

Transformer-based interpretable multi-modal data fusion for skin lesion classification

A multimodal transformer to fuse images and metadata for skin disease classification

Pay Less On Clinical Images: Asymmetric Multi-Modal Fusion Method For Efficient Multi-Label Skin Lesion Classification

A Novel Transfer Learning Framework for Multimodal Skin Lesion Analysis

A Deep CNN Transformer Hybrid Model for Skin Lesion Classification of Dermoscopic Images Using Focal Loss

Skin Cancer Detection utilizing Deep Learning: Classification of Skin Lesion Images using a Vision Transformer

RemixFormer++: A Multi-modal Transformer Model for Precision Skin Tumor Differential Diagnosis with Memory-efficient Attention

Skin Lesion classification based on two-modal images using a multi-scale fully-shared fusion network

Rectus sparing approach to left ventricular assist device exchange and use of the omental flap for coverage.

Single-Shared Network with Prior-Inspired Loss for Parameter-Efficient Multi-Modal Imaging Skin Lesion Classification

A Novel Perspective for Multi-modal Multi-label Skin Lesion Classification

Bi-directional Dermoscopic Feature Learning and Multi-scale Consistent Decision Fusion for Skin Lesion Segmentation

An improved transformer network for skin cancer classification

A Novel Vision Transformer Model for Skin Cancer Classification

Medical Diagnosis with Large Scale Multimodal Transformers: Leveraging Diverse Data for More Accurate Diagnosis

SkinDistilViT: Lightweight Vision Transformer for Skin Lesion Classification

MLFF-Net: a multi-model late feature fusion network for skin disease classification

MASDF-Net: A Multi-Attention Codec Network with Selective and Dynamic Fusion for Skin Lesion Segmentation

SUTrans-NET: a hybrid transformer approach to skin lesion segmentation

Enhanced deep bottleneck transformer model for skin lesion classification

Application of Multimodal Fusion Deep Learning Model in Disease Recognition