Space-correlated Contrastive Representation Learning with Multiple Instances.

Danming Song,Yipeng Gao,Junkai Yan,Wei Sun,Wei-Shi Zheng
DOI: https://doi.org/10.1109/ICPR56361.2022.9956034
2022-01-01
Abstract:Self-supervised contrastive learning methods have shown promising transferability in pretraining by maximizing the mutual information between two cropped regions as views from the same image. In order to effectively extract mutual information between views, the cropped regions need to be the same instance as prior hypothesis. However, the data collected in general scenes usually have multiple instances, so the two cropped regions probably contain different instances which will mislead the contrastive learning process. In this paper, we make the first attempt to exploit the spatial position relationships of the two cropped regions in self-supervised contrastive learning with images that include multiple instances. Then, we propose an effective method called Space-correlated Contrastive Learning (SpaceCL). Specifically, given two randomly cropped regions as contrastive pairs from the same image, we implement self-supervised contrastive learning by optimizing a space correspondence contrastive similarity loss. As a result, our method achieves state-of-the-art performance and remarkably outperforms other counterparts when pretrained on the COCO dataset of which images contain multiple instances. Experiments show our method outperforms ReSim with 2.6%AP on PASCAL VOC object detection, 0.8%AP <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">bb</sup> and 0.6%AP <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">mk</sup> on COCO object detection and instance segmentation, 1.3%AP <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">mk</sup> on Cityscapes instance segmentation.
What problem does this paper attempt to address?