Learning What and Where to Learn: A New Perspective on Self-supervised Learning
Wenyi Zhao,Lu Yang,Weidong Zhang,Yongqin Tian,Wenhe Jia,Wei Li,Mu Yang,Xipeng Pan,Huihua Yang
DOI: https://doi.org/10.1109/tcsvt.2023.3298937
IF: 5.859
2023-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Self-supervised learning (SSL) has demonstrated its power in generalized model acquisition by leveraging the discriminative semantic and explicit positional information of unlabeled datasets. Unfortunately, mainstream contrastive learning-based methods excessive focus on semantic information and ignore the position is also the carrier of image content, resulting in inadequate data utilization and extensive computational consumption. To address these issues, we present an efficient SSL framework, learning What and Where to learn (W2SSL), to aggregate semantic and position features. Concretely, we devise a spatially-coupled sampling manner to process images through pre-defined rules, which integrates the advantage of semantic (What) and positional (Where) features into framework to enrich the diversity of feature representation capabilities and improve data utilization. Besides, a spectrum of latent vectors is obtained by mapping the positional features, which implicitly explores the relationship between these vectors. Whereafter, the corresponding discriminative and contrastive optimization objectives are seamlessly embedded in the framework via a cascade paradigm to explore semantic and positional features. The proposed W2SSL is verified on different types of datasets, which demonstrates that it still outperforms state-of-the-art SSL methods even with half the computational consumption. Code will be available at https://github.com/WilyZhao8/W2SSL.
engineering, electrical & electronic