MSMA: A multi-stage and multi-attention algorithm for the classification of multimodal skin lesions

Ci Shu,Long Yu,Shengwei Tian,Xianwei Shi
DOI: https://doi.org/10.1016/j.bspc.2024.106180
IF: 5.1
2024-03-02
Biomedical Signal Processing and Control
Abstract:Skin lesion classification is a fundamental task for automated skin lesion analysis. Relative to a single modality, multiple modalities can provide different information to drive the rapid development of the dermatological lesion classification task. In this work, we propose a ConvNext-based multi-modal attention mechanism fusion framework for multi-modal skin lesion classification. To fully exploit the complementary information between different modalities and provide a more comprehensive modal interaction and fusion, we construct a multi-stage modal fusion framework with a dual-stream architecture. In each feature extraction stage of image modality fusion, we begin by utilizing use the cross-modal feature interaction module to interact with features in spatial and channel dimensions while suppressing the introduction of noisy information, and then use the multi-scale cross-attention fusion module to provide long-range dependencies and semantic information at different scales to facilitate the flow and fusion of information between modalities. Finally, the image modalities and text modalities are aggregated with features using an image text feature fusion module. We validate the effectiveness of the proposed method on the publicly available multi-modal skin lesion dataset Derm7pt. The average accuracy of multi-modal skin lesion classification was 77.6%, outperforming current state-of-the-art methods and enhancing the average accuracy of the test set by 1.3%.
engineering, biomedical
What problem does this paper attempt to address?