Abstract:Background In recent years, skin lesion has become a major public health concern, and the diagnosis and management of skin lesions depend heavily on the correct segmentation of the lesions. Traditional convolutional neural networks (CNNs) have demonstrated promising results in skin lesion segmentation, but they are limited in their ability to capture distant connections and intricate features. In addition, current medical image segmentation algorithms rarely consider the distribution of different categories in different regions of the image and do not consider the spatial relationship between pixels. Objectives This study proposes a self-adaptive position-aware skin lesion segmentation model SapFormer to capture global context and fine-grained detail, better capture spatial relationships, and adapt to different positional characteristics. The SapFormer is a multi-scale dynamic position-aware structure designed to provide a more flexible representation of the relationships between skin lesion characteristics and lesion distribution. Additionally, it increases skin lesion segmentation accuracy and decreases incorrect segmentation of non-lesion areas. Innovations SapFormer designs multiple hybrid transformers for multi-scale feature encoding of skin images and multi-scale positional feature sensing of the encoded features using a transformer decoder to obtain fine-grained features of the lesion area and optimize the regional feature distribution. The self-adaptive feature framework, built upon the transformer decoder module, dynamically and automatically generates parameterizations with learnable properties at different positions. These parameterizations are derived from the multi-scale encoding characteristics of the input image. Simultaneously, this paper utilizes the cross-attention network to optimize the features of the current region according to the features of other regions, aiming to increase skin lesion segmentation accuracy. Main results The ISIC-2016, ISIC-2017, and ISIC-2018 datasets for skin lesions are used as the basis for the experiment. On these datasets, the proposed model has accuracy values of 97.9 %, 94.3 %, and 95.7 %, respectively. The proposed model's IOU values are, in order, 93.2 %, 86.4 %, and 89.4 %. The proposed model's DSC values are 96.4 %, 92.6 %, and 94.3 %, respectively. All three metrics surpass the performance of the majority of state-of-the-art (SOTA) models. SapFormer's metrics on these datasets demonstrate that it can precisely segment skin lesions. Notably, our approach exhibits remarkable noise resistance in non-lesion areas, while simultaneously conducting finer-grained regional feature extraction on the skin lesion image. Conclusions In conclusion, the integration of a transformer-guided position-aware network into semantic skin lesion segmentation results in a notable performance boost. The ability of our proposed network to capture spatial relationships and fine-grained details proves beneficial for effective skin lesion segmentation. By enhancing lesion localization, feature extraction, quantitative analysis, and classification accuracy, the proposed segmentation model improves the diagnostic efficiency of skin lesion analysis on dermoscopic images. It assists dermatologists in making more accurate and efficient diagnoses, ultimately leading to better patient care and outcomes. This research paves the way for advances in diagnosing and treating skin lesions, promoting better understanding and decision-making in the clinical setting.

SUTrans-NET: a hybrid transformer approach to skin lesion segmentation

Attention-Guided Network with Densely Connected Convolution for Skin Lesion Segmentation

CTH-Net: A CNN and Transformer hybrid network for skin lesion segmentation

Skin Lesion Segmentation Improved by Transformer-based Networks with Inter-scale Dependency Modeling

Enhancing skin lesion segmentation with a fusion of convolutional neural networks and transformer models

MT-TransUNet: Mediating Multi-Task Tokens in Transformers for Skin Lesion Segmentation and Classification

TESL-Net: A Transformer-Enhanced CNN for Accurate Skin Lesion Segmentation

HMT-Net: Transformer and MLP Hybrid Encoder for Skin Disease Segmentation

Intelligent skin lesion segmentation using deformable attention Transformer U-Net with bidirectional attention mechanism in skin cancer images

Transformer guided self-adaptive network for multi-scale skin lesion image segmentation

TransSea: Hybrid CNN–Transformer With Semantic Awareness for 3-D Brain Tumor Segmentation

TransSea: Hybrid CNN-Transformer with Semantic Awareness for 3D Brain Tumor Segmentation

SLT-Net: A codec network for skin lesion segmentation

FAT-Net: Feature adaptive transformers for automated skin lesion segmentation

MobileUNETR: A Lightweight End-To-End Hybrid Vision Transformer For Efficient Medical Image Segmentation

Inter-Scale Dependency Modeling for Skin Lesion Segmentation with Transformer-based Networks

An improved transformer network for skin cancer classification

CSU-Net: A CNN-Transformer Parallel Network for Multimodal Brain Tumour Segmentation

Enhanced deep bottleneck transformer model for skin lesion classification

SkinMamba: A Precision Skin Lesion Segmentation Architecture with Cross-Scale Global State Modeling and Frequency Boundary Guidance

SkinFormer: Learning Statistical Texture Representation with Transformer for Skin Lesion Segmentation