HMSN: Hyperbolic Self-Supervised Learning by Clustering with Ideal Prototypes

Aiden Durrant,Georgios Leontidis
2023-05-18
Abstract:Hyperbolic manifolds for visual representation learning allow for effective learning of semantic class hierarchies by naturally embedding tree-like structures with low distortion within a low-dimensional representation space. The highly separable semantic class hierarchies produced by hyperbolic learning have shown to be powerful in low-shot tasks, however, their application in self-supervised learning is yet to be explored fully. In this work, we explore the use of hyperbolic representation space for self-supervised representation learning for prototype-based clustering approaches. First, we extend the Masked Siamese Networks to operate on the Poincaré ball model of hyperbolic space, secondly, we place prototypes on the ideal boundary of the Poincaré ball. Unlike previous methods we project to the hyperbolic space at the output of the encoder network and utilise a hyperbolic projection head to ensure that the representations used for downstream tasks remain hyperbolic. Empirically we demonstrate the ability of these methods to perform comparatively to Euclidean methods in lower dimensions for linear evaluation tasks, whilst showing improvements in extreme few-shot learning tasks.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Main Problems Addressed by the Paper This paper primarily addresses the issue of leveraging hyperbolic space in self-supervised learning (SSL) to better embed the semantic hierarchical structure in natural images. ### Specific Goals and Contributions 1. **Proposing Hyperbolic Masked Siamese Networks (HMSN)**: - Extending Masked Siamese Networks (MSNs) to hyperbolic space (Poincaré ball model) to utilize the low-distortion embedding capability of hyperbolic space for tree-like structures. - Demonstrating through experiments that HMSN can perform comparably to Euclidean methods in linear evaluation tasks with fewer dimensions and show improvements in extreme few-shot learning tasks. 2. **Introducing Ideal Prototypes**: - Placing prototypes on the ideal boundary of the Poincaré ball to encourage full utilization of hyperbolic space. - Proposing a new loss function based on the Busemann distance metric to train the network to produce good hyperbolic representations. 3. **Proposing Hyperbolic Projection Head**: - Projecting Euclidean representations to the hyperbolic space of the Poincaré ball at the encoder output to ensure that the representations used in downstream tasks retain hyperbolic properties. - Using a fully hyperbolic projection network to ensure that the learned hyperbolicity can be utilized in downstream tasks. ### Overview of Experimental Results - **Linear Classification**: HMSN-IP performs similarly to the MSN baseline in linear classification on the ImageNet-1K dataset but uses fewer embedding dimensions (64 dimensions compared to 256 dimensions). - **Few-Shot Linear Classification**: HMSN-IP outperforms its Euclidean baseline in few-shot linear classification tasks using only 1% of labeled samples, with a performance improvement of 1.0%. - **Impact of Projection Head**: HMSN-IP with a hyperbolic projection head achieves higher performance under a hyperbolic linear classifier, demonstrating the importance of hyperbolic properties in downstream tasks. In summary, by introducing the concept of hyperbolic space and corresponding technical improvements, this paper aims to enhance the performance of self-supervised learning methods in few-shot learning tasks and validates the effectiveness of the proposed improvements through experiments.