Abstract:High spatial resolution (HSR) remote sensing images have a wide range of application prospects in the fields of urban planning, agricultural planning and military training. Therefore, the research on the semantic segmentation of remote sensing images becomes extremely important. However, large data volume and the complex background of HSR remote sensing images put great pressure on the algorithm efficiency. Although the pressure on the GPU can be relieved by down-sampling the image or cropping it into small patches for separate processing, the loss of local details or global contextual information can lead to limited segmentation accuracy. In this study, we propose a multi-field context fusion network (MCFNet), which can preserve both global and local information efficiently. The method consists of three modules: a backbone network, a patch selection module (PSM), and a multi-field context fusion module (FM). Specifically, we propose a confidence-based local selection criterion in the PSM, which adaptively selects local locations in the image that are poorly segmented. Subsequently, the FM dynamically aggregates the semantic information of multiple visual fields centered on that local location to enhance the segmentation of these local locations. Since MCFNet only performs segmentation enhancement on local locations in an image, it can improve segmentation accuracy without consuming excessive GPU memory. We implement our method on two high spatial resolution remote sensing image datasets, DeepGlobe and Potsdam, and compare the proposed method with state-of-the-art methods. The results show that the MCFNet method achieves the best balance in terms of segmentation accuracy, memory efficiency, and inference speed.

SPFNet:Subspace Pyramid Fusion Network for Semantic Segmentation

FCPFNet: Feature Complementation Network with Pyramid Fusion for Semantic Segmentation

Technical Report on Subspace Pyramid Fusion Network for Semantic Segmentation

Enhanced Feature Pyramid Network for Semantic Segmentation.

BMSeNet: Multiscale Context Pyramid Pooling and Spatial Detail Enhancement Network for Real-Time Semantic Segmentation

Efficient pyramid context encoding and feature embedding for semantic segmentation

Semantic Segmentation Based on Spatial Pyramid Pooling and Multilayer Feature Fusion

Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

Semantic Segmentation Network Based on Adaptive Attention and Deep Fusion Utilizing a Multi-Scale Dilated Convolutional Pyramid

MSCFNet: A Lightweight Network with Multi-Scale Context Fusion for Real-Time Semantic Segmentation

S$^2$-FPN: Scale-ware Strip Attention Guided Feature Pyramid Network for Real-time Semantic Segmentation

FuzzyNet: Context Encoding and Spatial Fuzzy Refinement Network in Semantic Segmentation

Multi-Field Context Fusion Network for Semantic Segmentation of High-Spatial-Resolution Remote Sensing Images

Adaptive Pyramid Context Network for Semantic Segmentation

Pyramid Fusion Transformer for Semantic Segmentation

FPANet: Feature Pyramid Aggregation Network for Real-Time Semantic Segmentation

Remote Sensing Image Semantic Segmentation Network Based on Multi-Scale Feature Enhancement Fusion

PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing Imagery

MFEAFN: Multi-scale feature enhanced adaptive fusion network for image semantic segmentation

CPFNet: Context Pyramid Fusion Network for Medical Image Segmentation.

Pyramidal Region Context Module for Semantic Segmentation