BSDSNet: Dual-Stream Feature Extraction Network Based on Segment Anything Model for Synthetic Aperture Radar Land Cover Classification

Yangyang Wang,Wengang Zhang,Weidong Chen,Chang Chen
DOI: https://doi.org/10.3390/rs16071150
IF: 5
2024-03-27
Remote Sensing
Abstract:Land cover classification using high-resolution Polarimetric Synthetic Aperture Radar (PolSAR) images obtained from satellites is a challenging task. While deep learning algorithms have been extensively studied for PolSAR image land cover classification, the performance is severely constrained due to the scarcity of labeled PolSAR samples and the limited domain acceptance of models. Recently, the emergence of the Segment Anything Model (SAM) based on the vision transformer (VIT) model has brought about a revolution in the study of specific downstream tasks in computer vision. Benefiting from its millions of parameters and extensive training datasets, SAM demonstrates powerful capabilities in extracting semantic information and generalization. To this end, we propose a dual-stream feature extraction network based on SAM, i.e., BSDSNet. We change the image encoder part of SAM to a dual stream, where the ConvNext image encoder is utilized to extract local information and the VIT image encoder is used to extract global information. BSDSNet achieves an in-depth exploration of semantic and spatial information in PolSAR images. Additionally, to facilitate a fine-grained amalgamation of information, the SA-Gate module is employed to integrate local–global information. Compared to previous deep learning models, BSDSNet's impressive ability to represent features is akin to a versatile receptive field, making it well suited for classifying PolSAR images across various resolutions. Comprehensive evaluations indicate that BSDSNet achieves excellent results in qualitative and quantitative evaluation when performing classification tasks on the AIR-PolSAR-Seg dataset and the WHU-OPT-SAR dataset. Compared to the suboptimal results, our method improves the Kappa metric by 3.68% and 0.44% on the AIR-PolSAR-Seg dataset and the WHU-OPT-SAR dataset, respectively.
environmental sciences,imaging science & photographic technology,remote sensing,geosciences, multidisciplinary
What problem does this paper attempt to address?
This paper proposes a new approach to solve the land cover classification problem in Synthetic Aperture Radar (SAR) using the Visual Transformer model. In the analysis of high-resolution polarized SAR images, traditional deep learning algorithms are limited in land cover classification due to the scarcity of annotated samples and the limitations of the models. The paper introduces a Segment Anything Model (SAM) based on the Visual Transformer model and proposes a dual-stream feature extraction network called BSDSNet. BSDSNet modifies the image encoder part of SAM, using ConvNext to extract local information and VIT to extract global information, in order to explore semantic and spatial information in SAR images. To refine the fusion of information, the paper also applies the SA-Gate module to integrate local-global information. Compared to traditional and deep learning methods, BSDSNet has stronger feature representation ability and is suitable for SAR image classification at different resolutions. Evaluation on the AIR-PolSAR-Seg and WHU-OPT-SAR datasets shows that BSDSNet achieves excellent results in qualitative and quantitative evaluation, with an improvement of 3.68% and 0.44% in the Kappa index compared to the second-best results, respectively. In summary, the main contribution of the paper is the design of a dual-stream network that combines local and global information to improve the land cover classification performance of SAR images.