MSGFormer: A DeepLabv3+ Like Semantically Masked and Pixel Contrast Transformer for MouseHole Segmentation
Peng Yang,Chunmei Li,Chengwu Fang,Shasha Kong,Yunpeng Jin,Kai Li,Haiyang Li,Xiangjie Huang,Yaosheng Han
DOI: https://doi.org/10.1109/access.2024.3372146
IF: 3.9
2024-03-09
IEEE Access
Abstract:In semantic segmentation, the efficient representation of multi-scale context is of paramount importance. Inspired by the remarkable performance of Vision Transformers (ViT) in image classification, subsequent researchers have proposed some Semantic Segmentation ViTs, most of which have achieved impressive results. However, these models often struggle to effectively utilizing multi-scale context, disregarding intra-image semantic context, and neglecting the global context of training data, i.e., the semantic relationships among pixels across different images. In this paper, we introduce the Sliding Window Dilated Attention and combine it with the Spatial Pyramid Pooling (SPP) to form a novel decoder called Sliding window dilated attention spatial pyramid pooling(SwinASPP). By adjusting the sliding window dilation rates, this decoder is capable of capturing multi-scale contextual information from different granularities. Additionally, we propose the Semantic Attention Block, which integrates semantic attention operations into the encoder. And adopt our proposed supervised pixel-wise contrastive learning algorithm, we shift the current training strategy to inter-image for semantic segmentation. Our experiments demonstrate that these methods lead to performance improvements on the SanJiangYuan MouseHole dataset and Cityscapes.
computer science, information systems,telecommunications,engineering, electrical & electronic