Abstract:The superiority of deeply learned representations relies on large-scale labeled datasets. However, annotating data are usually expensive or even infeasible in some scenarios. To address this problem, we propose an unsupervised method to leverage instance discrimination and similarity for deep visual representation learning. The method is based on an observation that convolutional neural networks (CNNs) can learn a meaningful visual representation with instancewise classification, in which each instance is treated as an individual class. By this instancewise discriminative learning, instances can reasonably distribute in the representation space, which reveals their similarities. In order to further improve visual representations, we propose a dual-level progressive similar instance selection (DPSIS) method to build a bridge from instance to class by selecting similar instances (neighbors) for each instance (anchor) and treating the anchor and its neighbors as the same class. To be specific, DPSIS adaptively selects two levels of neighbors, that is: 1) an "absolutely similar level" and 2) a "relatively similar level." Instances in the absolutely similar level are used as hard labels, while instances in the relatively similar level are used as soft labels. Moreover, during training, DPSIS is able to progressively select more neighbors without human supervision. At the beginning of training, because CNNs are weak, most instances are distributed relatively randomly in the representation space and only a few easy-to-recognize instances are selected as neighbors. As CNN models become stronger, the semantic meaning of each instance grows clearer. Those instances originally distributed in a relatively random manner gradually move to meaningful positions. This consequently facilitates CNN training since the number of reliable samples increases. Experiments on seven benchmarks, including three small-scale and two large-scale coarse-grained image classification datasets, and two fine-grained categorization datasets, demonstrate the effectiveness of our DPSIS. Our codes have been released at https://github.com/hehefan/DPSIS.

Unsupervised Visual Representation Learning by Graph-Based Consistent Constraints.

Self-Supervised Node Representation Learning Via Node-to-Neighbourhood Alignment.

Learning the Implicit Semantic Representation on Graph-Structured Data

Deep Unsupervised Learning of Visual Similarities

Unsupervised Visual Representation Learning via Dual-Level Progressive Similar Instance Selection

Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning

Towards Unsupervised Representation Learning: Learning, Evaluating and Transferring Visual Representations

Deep Graph Contrastive Representation Learning

Multi-View Graph Embedding Learning for Image Co-Segmentation and Co-Localization

Visual-Semantic Graph Matching for Visual Grounding

Unsupervised Visual Representation Learning by Context Prediction

GRLC: Graph Representation Learning With Constraints

Contrastive Multi-View Representation Learning on Graphs

ULD-Net: 3D unsupervised learning by dense similarity learning with equivariant-crop

Graph Contrastive Learning with Constrained Graph Data Augmentation

Learning to Associate Words and Images Using a Large-scale Graph

Self-Supervised Graph Representation Learning via Global Context Prediction

Self-supervised Graph-level Representation Learning with Adversarial Contrastive Learning

Unsupervised Object-Centric Learning from Multiple Unspecified Viewpoints

Graph Representation Learning Meets Computer Vision: A Survey

Consistency Graph Modeling for Semantic Correspondence