Dual Prototypical Contrastive Learning for Few-shot Semantic Segmentation

Hyeongjun Kwon,Somi Jeong,Sunok Kim,Kwanghoon Sohn
DOI: https://doi.org/10.48550/arXiv.2111.04982
2021-11-09
Abstract:We address the problem of few-shot semantic segmentation (FSS), which aims to segment novel class objects in a target image with a few annotated samples. Though recent advances have been made by incorporating prototype-based metric learning, existing methods still show limited performance under extreme intra-class object variations and semantically similar inter-class objects due to their poor feature representation. To tackle this problem, we propose a dual prototypical contrastive learning approach tailored to the FSS task to capture the representative semanticfeatures effectively. The main idea is to encourage the prototypes more discriminative by increasing inter-class distance while reducing intra-class distance in prototype feature space. To this end, we first present a class-specific contrastive loss with a dynamic prototype dictionary that stores the class-aware prototypes during training, thus enabling the same class prototypes similar and the different class prototypes to be dissimilar. Furthermore, we introduce a class-agnostic contrastive loss to enhance the generalization ability to unseen classes by compressing the feature distribution of semantic class within each episode. We demonstrate that the proposed dual prototypical contrastive learning approach outperforms state-of-the-art FSS methods on PASCAL-5i and COCO-20i datasets. The code is available at:<a class="link-external link-https" href="https://github.com/kwonjunn01/DPCL1" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is few - shot semantic segmentation (FSS) with a small number of samples. Specifically, the goal of FSS is to use a small number of labeled samples (support sets) to segment new - class objects in the target image. Although some progress has been made recently by introducing prototype - based metric learning methods, existing methods still show limited performance when dealing with extreme intra - class object variations and cross - class objects with similar semantics, mainly due to the weak feature representation ability. To overcome these problems, the paper proposes the Dual Prototypical Contrastive Learning (DPCL) method, aiming to effectively capture representative semantic features and improve the discriminative and generalization abilities of feature representations. ### Main contributions of the paper 1. **Propose the dual - prototype contrastive learning framework**: - **Class - Specific Contrastive Loss**: By increasing the inter - class distance while reducing the intra - class distance, the prototypes are made more discriminative. For this purpose, a dynamic prototype dictionary is introduced to store class - aware prototypes during the training process, so that the prototypes of the same class are similar and the prototypes of different classes are not similar. - **Class - Agnostic Contrastive Loss**: By compressing the feature distribution of semantic classes in each mini - batch, the generalization ability to unseen classes is enhanced. 2. **Improve feature representation**: - Through class - specific contrastive learning and class - agnostic contrastive learning, DPCL can better separate inter - class features and compact intra - class features in the embedding space, thereby improving the discriminative and generalization abilities of feature representations. 3. **Experimental verification**: - Extensive experiments were carried out on the PASCAL - 5i and COCO - 20i datasets, and the results show that DPCL outperforms existing FSS methods in both 1 - shot and 5 - shot settings. ### Key technologies - **Dynamic prototype dictionary**: Stores class prototypes extracted from past mini - batches for generating positive and negative sample pairs. - **Momentum encoder**: Through the momentum update method, it ensures that the encoder is updated slowly during the training process and provides a consistent feature representation. - **Contrastive loss functions**: Include the class - specific contrastive loss \( L_{\text{cs}-\text{NCE}} \) and the class - agnostic contrastive loss \( L_{\text{ca}-\text{NCE}} \), which are respectively used to enhance inter - class separation and intra - class compactness. ### Experimental results - **PASCAL - 5i dataset**: - In the 1 - shot and 5 - shot settings, DPCL respectively improves the mIoU by about 4% compared to the baseline method PANet. - Qualitative results show that DPCL can generate more accurate segmentation results in blurred and difficult regions. - **COCO - 20i dataset**: - In the 1 - shot and 5 - shot settings, DPCL respectively improves the mIoU by 10.5% and 6.1% compared to traditional methods. - Although COCO - 20i contains more object categories and challenging samples, DPCL still shows strong robustness. ### Conclusion The dual - prototype contrastive learning method proposed in the paper has achieved significant performance improvement in the few - shot semantic segmentation task, especially when dealing with extreme intra - class variations and cross - class objects with similar semantics. This provides new ideas and methods for research in the field of few - shot learning.