Metadata-P2T-Net: a Skin Lesion Classification Network Integrating Dermoscopic Metadata and Image Data

Weili Liu,Jingli Wang,Bo Wang
DOI: https://doi.org/10.1117/12.3038813
2024-01-01
Abstract:Early detection of skin cancer significantly improves the five-year survival rate for patients. However, early-stage malignant tumors exhibit very subtle lesions in the skin, making their symptoms less noticeable. Professional doctors require multiple biopsies and tissue extraction to diagnose the type of lesion. Existing machine learning methods struggle to focus simultaneously on spatial detail and metadata semantic features, resulting in lower recognition accuracy in skin lesion images. To effectively represent spatial and metadata features and prevent issues such as misclassification of easily distinguishable images due to an overemphasis on detail, a new skin lesion classification network called Metadata-P2T-Net has been proposed. This method uses a P2T-based visual backbone network and incorporates a Multi-Head Attention Feature Fusion Module (MHAFFM), enhancing the model's capability to model global context and dynamically weight visual features with metadata information, highlighting metadata-related visual regions. Additionally, a Multi-Dimensional Fusion Residual Module (MFRM) has been introduced, which effectively integrates visual, fusion, and metadata features to improve the model's classification performance. Experiments conducted on the ISIC2019 dataset show that the Metadata-P2T-Net achieved a Matthews correlation coefficient (MCC) of 0.9086, an accuracy of 93.69%, a precision of 93.89%, a recall of 86.77%, and an F1-Score of 90.18%. Compared with other methods, these metrics significantly improved, demonstrating the effective extraction of detailed and metadata features and providing a new basis for the diagnosis of dermatoscopic images.
What problem does this paper attempt to address?