A Novel Self-Learning Network Integrating Contrastive Learning, Perceptual Learning and Masked Image Modelling

Yingxian Chen,Rui Yang,Rushi Lan
DOI: https://doi.org/10.1117/12.3021579
2024-01-01
Abstract:Unsupervised learning methods in computer vision have achieved remarkable success, exceeding the performance of supervised learning methods. It is noteworthy that current unsupervised learning methods share certain similarities, particularly in their data augmentation techniques. Masking, a type of data augmentation, can be utilized for both contrastive learning and masked image modelling. This paper presents a novel deep learning approach on visual unsupervised learning. It integrates previous methods such as contrastive learning, perceptual learning, self-distillation and masked image modelling. In our method, we treat the network that handles the original images as the teacher network, and the network that handles the masked images as the student network. The student network employs the representations extracted by the projection head for contrastive learning, while the features generated by the decoder are employed for masked image modeling. The process of self-knowledge distillation is facilitated by perceptual learning between the teacher and student networks. This model aligns with the main idea of contrastive learning, which aims to pull similar images closer while pushing dissimilar images further apart. Simultaneously, it reflects the main idea of masked image modelling, which enables the extraction of semantic information from large scale masked pixel reconstruction tasks. Additionally, we compare the effect of self-supervised methods to the performance of the model. Our results show that with only 75 epochs of fine-tuning, our 29M-parameter model achieves 78.5% top-1 accuracy on the ImageNet-1k dataset.
What problem does this paper attempt to address?