BSDSNet: Dual-Stream Feature Extraction Network Based on Segment Anything Model for Synthetic Aperture Radar Land Cover Classification

Yangyang Wang,Wengang Zhang,Weidong Chen,Chang Chen

DOI: https://doi.org/10.3390/rs16071150

IF: 5

2024-03-27

Remote Sensing

Abstract:Land cover classification using high-resolution Polarimetric Synthetic Aperture Radar (PolSAR) images obtained from satellites is a challenging task. While deep learning algorithms have been extensively studied for PolSAR image land cover classification, the performance is severely constrained due to the scarcity of labeled PolSAR samples and the limited domain acceptance of models. Recently, the emergence of the Segment Anything Model (SAM) based on the vision transformer (VIT) model has brought about a revolution in the study of specific downstream tasks in computer vision. Benefiting from its millions of parameters and extensive training datasets, SAM demonstrates powerful capabilities in extracting semantic information and generalization. To this end, we propose a dual-stream feature extraction network based on SAM, i.e., BSDSNet. We change the image encoder part of SAM to a dual stream, where the ConvNext image encoder is utilized to extract local information and the VIT image encoder is used to extract global information. BSDSNet achieves an in-depth exploration of semantic and spatial information in PolSAR images. Additionally, to facilitate a fine-grained amalgamation of information, the SA-Gate module is employed to integrate local–global information. Compared to previous deep learning models, BSDSNet's impressive ability to represent features is akin to a versatile receptive field, making it well suited for classifying PolSAR images across various resolutions. Comprehensive evaluations indicate that BSDSNet achieves excellent results in qualitative and quantitative evaluation when performing classification tasks on the AIR-PolSAR-Seg dataset and the WHU-OPT-SAR dataset. Compared to the suboptimal results, our method improves the Kappa metric by 3.68% and 0.44% on the AIR-PolSAR-Seg dataset and the WHU-OPT-SAR dataset, respectively.

environmental sciences,imaging science & photographic technology,remote sensing,geosciences, multidisciplinary

What problem does this paper attempt to address?

This paper proposes a new approach to solve the land cover classification problem in Synthetic Aperture Radar (SAR) using the Visual Transformer model. In the analysis of high-resolution polarized SAR images, traditional deep learning algorithms are limited in land cover classification due to the scarcity of annotated samples and the limitations of the models. The paper introduces a Segment Anything Model (SAM) based on the Visual Transformer model and proposes a dual-stream feature extraction network called BSDSNet. BSDSNet modifies the image encoder part of SAM, using ConvNext to extract local information and VIT to extract global information, in order to explore semantic and spatial information in SAR images. To refine the fusion of information, the paper also applies the SA-Gate module to integrate local-global information. Compared to traditional and deep learning methods, BSDSNet has stronger feature representation ability and is suitable for SAR image classification at different resolutions. Evaluation on the AIR-PolSAR-Seg and WHU-OPT-SAR datasets shows that BSDSNet achieves excellent results in qualitative and quantitative evaluation, with an improvement of 3.68% and 0.44% in the Kappa index compared to the second-best results, respectively. In summary, the main contribution of the paper is the design of a dual-stream network that combines local and global information to improve the land cover classification performance of SAR images.

BSDSNet: Dual-Stream Feature Extraction Network Based on Segment Anything Model for Synthetic Aperture Radar Land Cover Classification

ClassWise-SAM-Adapter: Parameter Efficient Fine-tuning Adapts Segment Anything to SAR Domain for Semantic Segmentation

Terrain Segmentation in Polarimetric SAR Images Using Dual-Attention Fusion Network

OPT-SAR-MS2Net: A Multi-Source Multi-Scale Siamese Network for Land Object Classification Using Remote Sensing Images

Boundary-enhanced dual-stream network for semantic segmentation of high-resolution remote sensing images

SDF2Net: Shallow to Deep Feature Fusion Network for PolSAR Image Classification

Semantic-Guided Attention Refinement Network for Salient Object Detection in Optical Remote Sensing Images

A Refined Pyramid Scene Parsing Network for Polarimetric SAR Image Semantic Segmentation in Agricultural Areas

Polsar Image Crop Classification Based on Deep Residual Learning Network.

Dual-Branch Fusion of Convolutional Neural Network and Graph Convolutional Network for PolSAR Image Classification

Dual-Branch CNN Incorporating Multiscale SVD Profile for PolSAR Image Classification

RSAM-Seg: A SAM-based Approach with Prior Knowledge Integration for Remote Sensing Image Semantic Segmentation

MP-ResNet: Multi-path Residual Network for the Semantic segmentation of High-Resolution PolSAR Images

Land Cover Classification of Synthetic Aperture Radar Images Based on Encoder--decoder Network with an Attention Mechanism

Polarimetric Synthetic Aperture Radar Image Semantic Segmentation Network with Lovász-softmax Loss Optimization

SAR Image Segmentation Based on Hierarchical Visual Semantic and Adaptive Neighborhood Multinomial Latent Model

Aerial-BiSeNet: A real-time semantic segmentation network for high resolution aerial imagery

Semantic Assistance in SAR Object Detection: A Mask-Guided Approach

A Network for Merging SAR Image Sea-Land Segmentation and Coastline Detection Tasks

Dual-Stream Class-Adaptive Network for Semi-Supervised Hyperspectral Image Classification

Semantic Attention and Structured Model for Weakly Supervised Instance Segmentation in Optical and SAR Remote Sensing Imagery