Abstract:Background: Use of artificial intelligence to identify dermoscopic images has brought major breakthroughs in recent years to the early diagnosis and early treatment of skin cancer, the incidence of which is increasing year by year worldwide and poses a great threat to human health. Achievements have been made in the research of skin cancer image classification by using the deep backbone of the convolutional neural network (CNN). This approach, however, only extracts the features of small objects in the image, and cannot locate the important parts. Objectives: As a result, researchers of the paper turn to vision transformers (VIT) which has demonstrated powerful performance in traditional classification tasks. The self-attention is to improve the value of important features and suppress the features that cause noise. Specifically, an improved transformer network named SkinTrans is proposed. Innovations: To verify its efficiency, a three step procedure is followed. Firstly, a VIT network is established to verify the effectiveness of SkinTrans in skin cancer classification. Then multi-scale and overlapping sliding windows are used to serialize the image and multi-scale patch embedding is carried out which pay more attention to multi-scale features. Finally, contrastive learning is used which makes the similar data of skin cancer encode similarly so that the encoding results of different data are as different as possible. Main results: The experiment is carried out based on two datasets, namely (1) HAM10000: a large dataset of multi-source dermatoscopic images of common skin cancers; (2)A clinical dataset of skin cancer collected by dermoscopy. The model proposed has achieved 94.3% accuracy on HAM10000 and 94.1% accuracy on our datasets, which verifies the efficiency of SkinTrans. Conclusions: The transformer network has not only achieved good results in natural language but also achieved ideal results in the field of vision, which also lays a good foundation for skin cancer classification based on multimodal data. This paper is convinced that it will be of interest to dermatologists, clinical researchers, computer scientists and researchers in other related fields, and provide greater convenience for patients.

EPVT: Environment-aware Prompt Vision Transformer for Domain Generalization in Skin Lesion Recognition

Prompt-driven Latent Domain Generalization for Medical Image Classification

Transitive Vision-Language Prompt Learning for Domain Generalization

Evidential Federated Learning for Skin Lesion Image Classification

AViT: Adapting Vision Transformers for Small Skin Lesion Segmentation Datasets

Assist-Dermo: A Lightweight Separable Vision Transformer Model for Multiclass Skin Lesion Classification

Self-supervised Vision Transformer are Scalable Generative Models for Domain Generalization

Learning A Low-Level Vision Generalist via Visual Task Prompt

PFPs: Prompt-guided Flexible Pathological Segmentation for Diverse Potential Outcomes Using Large Vision and Language Models

A multimodal transformer to fuse images and metadata for skin disease classification

A Novel Transfer Learning Framework for Multimodal Skin Lesion Analysis

Visual-Attribute Prompt Learning for Progressive Mild Cognitive Impairment Prediction

ICL-Net: Global and Local Inter-pixel Correlations Learning Network for Skin Lesion Segmentation

Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction

Skin Cancer Detection utilizing Deep Learning: Classification of Skin Lesion Images using a Vision Transformer

Artifact-Based Domain Generalization of Skin Lesion Models

An improved transformer network for skin cancer classification

Consistent representation via contrastive learning for skin lesion diagnosis

Semantic-Oriented Visual Prompt Learning for Diabetic Retinopathy Grading on Fundus Images

Cross-domain visual prompting with spatial proximity knowledge distillation for histological image classification

LSPT: Long-term Spatial Prompt Tuning for Visual Representation Learning