LPCL: Localized Prominence Contrastive Learning for Self-Supervised Dense Visual Pre-Training

Zihan Chen,Hongyuan Zhu,Hao Cheng,Siya Mi,Yu Zhang,Xin Geng
DOI: https://doi.org/10.1016/j.patcog.2022.109185
IF: 8
2023-01-01
Pattern Recognition
Abstract:Self-supervised pre-training has attracted increasing attention given its promising performance in train-ing backbone networks without using labels. By far, most methods focus on image classification with datasets containing iconic objects and simple background, e.g. ImageNet. However, these methods show sub-optimal performance for dense prediction tasks (e.g. object detection and scene parsing) when di-rectly pre-training on datasets (e.g. PASCAL VOC and COCO) with multiple objects and cluttered back-grounds. Researchers explored self-supervised dense pre-training methods by adapting recent image pre -training methods. Nevertheless, they require a large number of negative samples and a long training time to reach reasonable performance. In this paper, we propose LPCL, a novel self-supervised representation learning method for dense predictions to settle these issues. To guide the instance information in multi -instance datasets, we define an online object patch selection module to select the local patches with the high possibility of containing instance area in the augmented views efficiently during learning. After ob-taining the patches, we present a novel multi-level contrastive learning method considering the instance representation of global-level, local-level and position-level without using negative samples. We conduct extensive experiments with LPCL directly pre-trained on PASCAL VOC and COCO. For PASCAL VOC im-age classification task, our model achieves state-of-the-art 86 . 2% accuracy pre-trained on COCO( +9 . 7% top-1 accuracy compared with baseline BYOL). On object detection, instance segmentation and semantic segmentation task, our proposed model also achieved competitive results compared with other state-of-the-art methods.(c) 2022 Elsevier Ltd. All rights reserved.
What problem does this paper attempt to address?