MMT: Mixed-Mask Transformer for Remote Sensing Image Semantic Segmentation

Zhe Xu,Jie Geng,Wen Jiang
DOI: https://doi.org/10.1109/TGRS.2023.3289408
IF: 8.2
2023-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Remote sensing image semantic segmentation is a crucial step in the intelligent interpretation of remote sensing. Most of the current approaches are based on the attention mechanism to enhance long-range representations. However, these works ignore the key problem of foreground-background imbalance, and their performances encounter a bottleneck. In this article, we introduce mask classification into remote sensing image interpretation for the first time and propose a novel mixed-mask Transformer (MMT) for remote sensing image semantic segmentation. Specifically, we propose a mixed-mask attention mechanism, a simple but effective module, which assists the network to learn more explicit intraclass and interclass correlations by capturing long-range interdependent representations. In addition, a progressive multiscale learning strategy (MSL) is proposed to solve the problem of large-scale-varied targets in remote sensing images, which integrates semantic and visual representations of different scale targets by efficiently utilizing large-scale feature maps in Transformer. Experimental results show that the proposed MMT exceeds the existing alternative approaches and achieves state-of-the-art performance on three semantic segmentation datasets.
What problem does this paper attempt to address?