Abstract:In the era of increasingly advanced Earth Observation (EO) technologies, extracting pertinent information (such as water-bodies) from the Earth's surface has become a crucial task. Deep Learning, especially via pre-trained models, currently offers a highly promising approach for the semantic segmentation of Remote Sensing Imagery (RSI). However, effectively adapting these pre-trained models to RSI tasks remains challenging. Typically, these models undergo fine-tuning for specialized tasks, involving modifications to their parameters or structure of the original architecture, which may impact their inherent generalization capabilities. Furthermore, robust pre-trained models on nature images are not specifically designed for RSI, presenting challenges in their direct application to RSI tasks. To alleviate these problems, our study introduces a light-weight Enhanced Semantic-positional Feature Fusion Network (ESFFNet), leveraging diverse pre-trained image encoders alongside extensive EO data. The proposed method begins by leveraging pre-trained encoders, specifically Vision Transformer (ViT)-based and Convolutional Neural Network (CNN)-based models, to extract deep semantic and precise positional features respectively, without additional training. Following this, we introduce the Enhanced Semantic-positional Feature Fusion Module (ESFFM). This module adeptly merges semantic features derived from the ViT-based encoder with spatial features extracted from the CNN-based encoder. Such integration is realized via multi-scale feature fusion, local and long-distance feature integration, and dense connectivity strategies, leading to a robust feature representation. Finally, the Primary Segmentation-guided Fine Extraction Module (PSFEM) further bolsters the precision of remote sensing image segmentation. Collectively, these two modules constitute our light-weight decoder, with a parameter size of less than 4 M. Our approach is evaluated on two distinct water-body datasets, indicating superiority over other leading segmentation techniques. In addition, our method also demonstrates exemplary efficacy in diverse remote sensing segmentation tasks, such as building extraction and land cover classification. The source codes will be available at https://github.com/zhilyzhang/ESFFNet.

End-to-End Semantic Segmentation Utilizing Multi-scale Baseline Light Field

Semi-Supervised Semantic Segmentation for Light Field Images Using Disparity Information

NLFNet: Non-Local Fusion Towards Generalized Multimodal Semantic Segmentation Across RGB-Depth, Polarization, and Thermal Images

Semantic Segmentation With Light Field Imaging and Convolutional Neural Networks

Light-field-depth-estimation Network Based on Epipolar Geometry and Image Segmentation.

Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

Incorporating Luminance, Depth and Color Information by a Fusion-based Network for Semantic Segmentation

Disentangling Light Fields for Super-Resolution and Disparity Estimation

LMANet: A Lightweight Asymmetric Semantic Segmentation Network Based on Multi-Scale Feature Extraction

Semantic Segmentation of Very-High-Resolution Remote Sensing Images via Deep Multi-Feature Learning

A Lightweight Depth Estimation Network for Wide-Baseline Light Fields

Light Field Image Super-Resolution Using Deformable Convolution

Light Field Reconstruction using Efficient Pseudo 4D Epipolar-Aware Structure

Light-Deeplabv3+: a lightweight real-time semantic segmentation method for complex environment perception

LMFNet: An Efficient Multimodal Fusion Approach for Semantic Segmentation in High-Resolution Remote Sensing

Light field super-resolution using complementary-view feature attention

Light Field Salient Object Detection with Sparse Views via Complementary and Discriminative Interaction Network

An optimal polynomial time algorithm for the common cycle economic lot and delivery scheduling problem

SLLEN: Semantic-aware Low-light Image Enhancement Network

Enhanced semantic-positional feature fusion network via diverse pre-trained encoders for remote sensing image water-body segmentation

LFEA-Net: semantic segmentation for urban point cloud scene via local feature extraction and aggregation