CP2: Copy-Paste Contrastive Pretraining for Semantic Segmentation

Feng Wang,Huiyu Wang,Chen Wei,Alan Yuille,Wei Shen
DOI: https://doi.org/10.48550/arXiv.2203.11709
2022-08-09
Abstract:Recent advances in self-supervised contrastive learning yield good image-level representation, which favors classification tasks but usually neglects pixel-level detailed information, leading to unsatisfactory transfer performance to dense prediction tasks such as semantic segmentation. In this work, we propose a pixel-wise contrastive learning method called CP2 (Copy-Paste Contrastive Pretraining), which facilitates both image- and pixel-level representation learning and therefore is more suitable for downstream dense prediction tasks. In detail, we copy-paste a random crop from an image (the foreground) onto different background images and pretrain a semantic segmentation model with the objective of 1) distinguishing the foreground pixels from the background pixels, and 2) identifying the composed images that share the same <a class="link-external link-http" href="http://foreground.Experiments" rel="external noopener nofollow">this http URL</a> show the strong performance of CP2 in downstream semantic segmentation: By finetuning CP2 pretrained models on PASCAL VOC 2012, we obtain 78.6% mIoU with a ResNet-50 and 79.5% with a ViT-S.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?