HAPiCLR: heuristic attention pixel-level contrastive loss representation learning for self-supervised pretraining
Van Nhiem Tran,Shen-Hsuan Liu,Chi-En Huang,Muhammad Saqlain Aslam,Kai-Lin Yang,Yung-Hui Li,Jia-Ching Wang
DOI: https://doi.org/10.1007/s00371-023-03217-x
IF: 2.835
2024-03-16
The Visual Computer
Abstract:Recent self-supervised contrastive learning methods are powerful and efficient for robust representation learning, pulling semantic features from different cropping views of the same image while pushing other features away from other images in the embedding vector space. However, model training for contrastive learning is quite inefficient. In the high-dimensional vector space of the images, images can differ from each other in many ways. We address this problem with heuristic attention pixel-level contrastive loss for representation learning (HAPiCLR), a self-supervised joint embedding contrastive framework that operates at the pixel level and makes use of heuristic mask information. HAPiCLR leverages pixel-level information from the object's contextual representation instead of identifying pair-wise differences in instance-level representations. Thus, HAPiCLR enhances contrastive learning objectives without requiring large batch sizes, memory banks, or queues, thereby reducing the memory footprint and the processing needed for large datasets. Furthermore, HAPiCLR loss combined with other contrastive objectives such as SimCLR or MoCo loss produces considerable performance boosts on all downstream tasks, including image classification, object detection, and instance segmentation.
computer science, software engineering