An improved transformer network for skin cancer classification

Chao Xin,Zhifang Liu,Keyu Zhao,Linlin Miao,Yizhao Ma,Xiaoxia Zhu,Qiongyan Zhou,Songting Wang,Lingzhi Li,Feng Yang,Suling Xu,Haijiang Chen
DOI: https://doi.org/10.1016/j.compbiomed.2022.105939
Abstract:Background: Use of artificial intelligence to identify dermoscopic images has brought major breakthroughs in recent years to the early diagnosis and early treatment of skin cancer, the incidence of which is increasing year by year worldwide and poses a great threat to human health. Achievements have been made in the research of skin cancer image classification by using the deep backbone of the convolutional neural network (CNN). This approach, however, only extracts the features of small objects in the image, and cannot locate the important parts. Objectives: As a result, researchers of the paper turn to vision transformers (VIT) which has demonstrated powerful performance in traditional classification tasks. The self-attention is to improve the value of important features and suppress the features that cause noise. Specifically, an improved transformer network named SkinTrans is proposed. Innovations: To verify its efficiency, a three step procedure is followed. Firstly, a VIT network is established to verify the effectiveness of SkinTrans in skin cancer classification. Then multi-scale and overlapping sliding windows are used to serialize the image and multi-scale patch embedding is carried out which pay more attention to multi-scale features. Finally, contrastive learning is used which makes the similar data of skin cancer encode similarly so that the encoding results of different data are as different as possible. Main results: The experiment is carried out based on two datasets, namely (1) HAM10000: a large dataset of multi-source dermatoscopic images of common skin cancers; (2)A clinical dataset of skin cancer collected by dermoscopy. The model proposed has achieved 94.3% accuracy on HAM10000 and 94.1% accuracy on our datasets, which verifies the efficiency of SkinTrans. Conclusions: The transformer network has not only achieved good results in natural language but also achieved ideal results in the field of vision, which also lays a good foundation for skin cancer classification based on multimodal data. This paper is convinced that it will be of interest to dermatologists, clinical researchers, computer scientists and researchers in other related fields, and provide greater convenience for patients.
What problem does this paper attempt to address?