SemanticCrop: Boosting Contrastive Learning Via Semantic-Cropped Views.

Ya Fang,Zipeng Chen,Weixuan Tang,Yuan-Gen Wang
DOI: https://doi.org/10.1007/978-981-99-8537-1_27
2024-01-01
Abstract:Siamese-structure-based contrastive learning has shown excellent performance in learning visual representations due to its ability to minimize the distance between positive pairs and increase the distance between negative pairs. Existing works mostly employ RandomCrop or ContrastiveCrop to obtain positive pairs of an image. However, RandomCrop causes the cropped views to contain many useless backgrounds, while ContrastiveCrop produces positive pairs that are too similar. In this paper, we propose a novel SemanticCrop to yield cropped views containing as much semantic information as possible. Specifically, SemanticCrop first computes a heatmap of an image. Then, an empirical threshold is tuned to box out a semantic region whose heatmap values are over this threshold. Finally, we design a center-suppressed probabilistic sampling to avoid excessive similarity between positive pairs, making the cropped view contain more parts of an object. As a plug-and-play module, the MoCo, SimCLR, SimSiam, and BYOL models equipped with our SemanticCrop module achieve an accuracy improvement from 0.5% to 2.34% on the CIFAR10, CIFAR100, IN-200, and IN-1K datasets. The code is available at https://github.com/GZHU-DVL/SemanticCrop .
What problem does this paper attempt to address?