A Remote Sensing Image Segmentation Model Based on Multi-Scale Feature Fusion

Ao Liu,Jianwei Lu,Qianli Ma,Yi Han,Yanfei Zhong
DOI: https://doi.org/10.1117/12.3007424
2023-01-01
Abstract:Large-scale remote sensing images contain rich terrain information, while small-scale images capture abundant detail information. Effectively extracting and fusing these multi-scale information has grown into a major challenge. To tackle these problems, this paper recommends a novel remote sensing image segmentation model called Multi-scale Vision Transformer (MS-ViT). The MS-ViT model comprises a multi-scale feature extraction module, a feature fusion module based on multi-scale self-attention, and a decoder based on convolutional neural networks. Additionally, the model introduces spatial attention mechanism and a position encoding method considering patch sizes, as well as a feature distribution alignment method and corresponding loss functions to further enhance segmentation performance. Experimental results demonstrate significant improvements in performance of the MS-ViT model compared to several other advanced models.
What problem does this paper attempt to address?