Abstract:Contrastive learning techniques make it possible to pretrain a general model in a self-supervised paradigm using a large number of unlabeled remote sensing images. The core idea is to pull positive samples defined by data augmentation techniques closer together while pushing apart randomly sampled negative samples to serve as supervised learning signals. This strategy is based on the strict identity hypothesis, i.e., positive samples are strictly defined by each (anchor) sample's own augmentation transformation. However, this leads to the over-instancing of the features learned by the model and the loss of the ability to fully identify ground objects. Therefore, we proposed a relaxed identity hypothesis governing the feature distribution of different instances within the same class of features. The implementation of the relaxed identity hypothesis requires the sampling and discrimination of the relaxed identical samples. In this study, to realize the sampling of relaxed identical samples under the unsupervised learning paradigm, the remote sensing image was used to show that nearby objects often present a large correlation; neighborhood sampling was carried out around the anchor sample; and the similarity between the sampled samples and the anchor samples was defined as the semantic similarity. To achieve sample discrimination under the relaxed identity hypothesis, the feature loss was calculated and reordered for the samples in the relaxed identical sample queue and the anchor samples, and the feature loss between the anchor samples and the sample queue was defined as the feature similarity. Through the sampling and discrimination of the relaxed identical samples, the leap from instance-level features to class-level features was achieved to a certain extent while enhancing the network's invariant learning of features. We validated the effectiveness of the proposed method on three datasets, and our method achieved the best experimental results on all three datasets compared to six self-supervised methods.

Mining Better Samples and Semantic Consistency for Contrast Learning in Forest Semantic Segmentation

When Masked Image Modeling Meets Source-free Unsupervised Domain Adaptation: Dual-Level Masked Network for Semantic Segmentation

Multi-Similarity Enhancement Network for Few-Shot Segmentation.

C3T: Contrastive Consistency Cross-Network Learning for Semi-Supervised Semantic Segmentation

Exploring Cross-Image Pixel Contrast for Semantic Segmentation

RiSSNet: Contrastive Learning Network with a Relaxed Identity Sampling Strategy for Remote Sensing Image Semantic Segmentation

Region-aware Contrastive Learning for Semantic Segmentation

Cross-Image Pixel Contrasting for Semantic Segmentation

Pixel Contrastive-Consistent Semi-Supervised Semantic Segmentation

Memory-Based Contrastive Learning with Optimized Sampling for Incremental Few-Shot Semantic Segmentation

Remote Sensing Image Semantic Change Detection Boosted by Semi-supervised Contrastive Learning of Semantic Segmentation

Semi-Supervised Semantic Segmentation of Remote Sensing Images With Iterative Contrastive Network

Confidence-Weighted Dual-Teacher Networks With Biased Contrastive Learning for Semi-Supervised Semantic Segmentation in Remote Sensing Images

Spatial and Semantic Consistency Contrastive Learning for Self-Supervised Semantic Segmentation of Remote Sensing Images

Remote Sensing Images Semantic Segmentation with General Remote Sensing Vision Model via a Self-Supervised Contrastive Learning Method.

A Spectral–Spatial Context-Boosted Network for Semantic Segmentation of Remote Sensing Images

Contextrast: Contextual Contrastive Learning for Semantic Segmentation

Degraded Image Semantic Segmentation Using Intra-image and Inter-image Contrastive Learning

Positive-Negative Equal Contrastive Loss for Semantic Segmentation

Stair Fusion Network With Context-Refined Attention for Remote Sensing Image Semantic Segmentation

Improving Semi-Supervised Semantic Segmentation with Dual-Level Siamese Structure Network