Abstract:Remote sensing techniques for shoreline extraction are crucial for monitoring changes in erosion rates, surface hydrology, and ecosystem structure. In recent years, Convolutional neural networks (CNNs) have developed as a cutting-edge deep learning technique that has been extensively used in shoreline extraction from remote sensing images, owing to their exceptional feature extraction capabilities. They are progressively replacing traditional methods in this field. However, most CNN models only focus on the features in local receptive fields, and overlook the consideration of global contextual information, which will hamper the model's ability to perform a precise segmentation of boundaries and small objects, consequently leading to unsatisfactory segmentation results. To solve this problem, we propose a parallel semantic segmentation network (TCU-Net) combining CNN and Transformer, to extract shorelines from multispectral remote sensing images, and improve the extraction accuracy. Firstly, TCU-Net imports the Pyramid Vision Transformer V2 (PVT V2) network and ResNet, which serve as backbones for the Transformer branch and CNN branch, respectively, forming a parallel dual-encoder structure for the extraction of both global and local features. Furthermore, a feature interaction module is designed to achieve information exchange, and complementary advantages of features, between the two branches. Secondly, for the decoder part, we propose a cross-scale multi-source feature fusion module to replace the original UNet decoder block, to aggregate multi-scale semantic features more effectively. In addition, a sea-land segmentation dataset covering the Yellow Sea region (GF Dataset) is constructed through the processing of three scenes from Gaofen-6 remote sensing images. We perform a comprehensive experiment with the GF dataset to compare the proposed method with mainstream semantic segmentation models, and the results demonstrate that TCU-Net outperforms the competing models in all three evaluation indices: the PA (pixel accuracy), F1-score, and MIoU (mean intersection over union), while requiring significantly fewer parameters and computational resources compared to other models. These results indicate that the TCU-Net model proposed in this article can extract the shoreline from remote sensing images more effectively, with a shorter time, and lower computational overhead.

TMNet: A Two-Branch Multi-Scale Semantic Segmentation Network for Remote Sensing Images

TCNet: Multiscale Fusion of Transformer and CNN for Semantic Segmentation of Remote Sensing Images

MFTransNet: A Multi-Modal Fusion with CNN-Transformer Network for Semantic Segmentation of HSR Remote Sensing Images

SSNet: A Novel Transformer and CNN Hybrid Network for Remote Sensing Semantic Segmentation

GLE-net: global-local information enhancement for semantic segmentation of remote sensing images

Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

CTMFNet: CNN and Transformer Multiscale Fusion Network of Remote Sensing Urban Scene Imagery

A Transformer-based Multi-Modal Fusion Network for Semantic Segmentation of High-Resolution Remote Sensing Imagery

Local-enhanced multi-scale aggregation swin transformer for semantic segmentation of high-resolution remote sensing images

ACTNet: A Dual-Attention Adapter with a CNN-Transformer Network for the Semantic Segmentation of Remote Sensing Imagery

Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

Multiscale Global Context Network for Semantic Segmentation of High-Resolution Remote Sensing Images

TCUNet: A Lightweight Dual-Branch Parallel Network for Sea-Land Segmentation in Remote Sensing Images

CNN and Transformer Fusion for Remote Sensing Image Semantic Segmentation

Multi-scale attention fusion network for semantic segmentation of remote sensing images

Scale-Aware Neural Network for Semantic Segmentation of Multi-Resolution Remote Sensing Images

Densely Based Multi-Scale and Multi-Modal Fully Convolutional Networks for High-Resolution Remote-Sensing Image Semantic Segmentation

A Crossmodal Multiscale Fusion Network for Semantic Segmentation of Remote Sensing Data

RemoteNet: Remote Sensing Image Segmentation Network based on Global-Local Information

Unsupervised Multi-Scale Hybrid Feature Extraction Network for Semantic Segmentation of High-Resolution Remote Sensing Images

Encoder- and Decoder-Based Networks Using Multiscale Feature Fusion and Nonlocal Block for Remote Sensing Image Semantic Segmentation