Abstract:Semantic segmentation of remote sensing images is vital in remote sensing technology. High-quality models for this task require a vast amount of images, and manual annotation is a process that is time-consuming and labor-intensive. Consequently, this has catalyzed the emergence of semi-supervised semantic segmentation methods. However, the complexity of foreground categories in remote sensing images poses a challenge to maintaining prediction consistency. Moreover, inherent characteristics such as intraclass variations and interclass similarities result in a certain degree of confusion among features of different classes in the feature space. This impacts the final classification results. To improve the model's consistency and optimize the classification of categories based on features, this article proposes a new semi-supervised semantic segmentation framework that combines consistency regularization and contrastive learning (CL). In terms of consistency regularization, the proposed method incorporates dual-teacher networks, introduces ClassMix for image augmentation, and uses confidence levels to integrate the predictions from these networks. By introducing perturbations at both the network and image levels, while simultaneously maintaining consistency, the predictive prowess and generalization ability of the model are enhanced. For CL, positive-unlabeled learning (PU-Learning) is used to improve the problem of mis-sampling when selecting features. At the same time, higher biased weights are allocated to more challenging negative samples, thereby elevating the complexity of feature learning and enhancing the discriminative capability of the final feature representation space. Our extensive experiments on the ISPRS Vaihingen dataset and the challenging iSAID dataset have served to underscore the superior performance of our approach.

Multimodal Contrastive Learning for Unpaired and Depth-privileged Semantic Segmentation.

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Understanding Dark Scenes by Contrasting Multi-Modal Observations

Cross-Modal Contrastive Learning for Domain Adaptation in 3D Semantic Segmentation.

Confidence-Weighted Dual-Teacher Networks With Biased Contrastive Learning for Semi-Supervised Semantic Segmentation in Remote Sensing Images

Joint Learning of Semantic Segmentation and Height Estimation for Remote Sensing Image Leveraging Contrastive Learning

Directed Mix Contrast for Lidar Point Cloud Segmentation

Spatial and Semantic Consistency Contrastive Learning for Self-Supervised Semantic Segmentation of Remote Sensing Images

Region-aware Contrastive Learning for Semantic Segmentation

P4Contrast: Contrastive Learning with Pairs of Point-Pixel Pairs for RGB-D Scene Understanding

Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning

Multimodal Contrastive Training for Visual Representation Learning

On the Generalization of Multi-modal Contrastive Learning

Contextrast: Contextual Contrastive Learning for Semantic Segmentation

Generalized Semantic Segmentation by Self-Supervised Source Domain Projection and Multi-Level Contrastive Learning

Cross-modal contrastive learning for multimodal sentiment recognition

Multi-modal contrastive mutual learning and pseudo-label re-learning for semi-supervised medical image segmentation

Guided Contrastive Boundary Learning for Semantic Segmentation

Depth Images Could Tell Us More: Enhancing Depth Discriminability for RGB-D Scene Recognition

Color and Geometric Contrastive Learning Based Intra-Frame Supervision for Self-Supervised Monocular Depth Estimation

Multimodal Contrastive Learning via Uni-Modal Coding and Cross-Modal Prediction for Multimodal Sentiment Analysis