Abstract:In this paper, we propose an algorithm that can be used on top of a wide variety of self-supervised (SSL) approaches to take advantage of hierarchical structures that emerge during training. SSL approaches typically work through some invariance term to ensure consistency between similar samples and a regularization term to prevent global dimensional collapse. Dimensional collapse refers to data representations spanning a lower-dimensional subspace. Recent work has demonstrated that the representation space of these algorithms gradually reflects a semantic hierarchical structure as training progresses. Data samples of the same hierarchical grouping tend to exhibit greater dimensional collapse locally compared to the dataset as a whole due to sharing features in common with each other. Ideally, SSL algorithms would take advantage of this hierarchical emergence to have an additional regularization term to account for this local dimensional collapse effect. However, the construction of existing SSL algorithms does not account for this property. To address this, we propose an adaptive algorithm that performs a weighted decomposition of the denominator of the InfoNCE loss into two terms: local hierarchical and global collapse regularization respectively. This decomposition is based on an adaptive threshold that gradually lowers to reflect the emerging hierarchical structure of the representation space throughout training. It is based on an analysis of the cosine similarity distribution of samples in a batch. We demonstrate that this hierarchical emergence exploitation (HEX) approach can be integrated across a wide variety of SSL algorithms. Empirically, we show performance improvements of up to 5.6% relative improvement over baseline SSL approaches on classification accuracy on Imagenet with 100 epochs of training.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is in self - supervised learning (SSL), how to use the hierarchical structures that gradually emerge during the training process to improve the existing SSL algorithms. Specifically, the existing SSL methods usually ensure the consistency between similar samples through some invariance terms and prevent global - dimension collapse through regularization terms. However, these methods do not take into account the local hierarchical structures that gradually form in the representation space, which lead to local - dimension collapse. The paper proposes a new algorithm - HEX (Hierarchical Emergence Exploitation), aiming to improve this problem by introducing local - hierarchical regularization. ### Main Contributions 1. **Hierarchical Structure Identification**: Based on the cosine - similarity distribution, a method is proposed to identify the local hierarchical structures in the representation space. 2. **Local - Hierarchical Regularization**: Integrate the identified hierarchical structures into the InfoNCE loss function and introduce local - hierarchical regularization to combat local - dimension collapse. 3. **Adaptive Threshold**: Introduce an adaptive threshold parameter \(\epsilon\), which is gradually adjusted during the training process to reflect the gradual formation of the hierarchical structures in the representation space. 4. **Performance Improvement**: Demonstrate the effectiveness of the HEX method in various classification tasks, especially in fine - grained recognition, large - scale evaluation, and different task diversities. ### Method Overview 1. **Hierarchical Analysis**: - Use the Cifar - 100 dataset for experiments, which has a natural hierarchical structure. - By analyzing the effective rank of the sample - representation matrices at different training stages, it is found that samples in the same super - category are more likely to collapse in local dimensions. - Identify samples belonging to the same hierarchical structure through the change of the cosine - similarity distribution. 2. **HEX Loss Function**: - The standard contrastive loss function (InfoNCE loss) can be decomposed into an invariance term and a regularization term. - Introduce a weight function \(Q_{hi}\), weight the negative samples belonging to the same hierarchical structure to increase the local - regularization effect. - Divide samples into hierarchical - structure samples and ordinary samples by adaptively or manually setting the threshold \(\epsilon\). 3. **Experimental Results**: - Conduct experiments on the Cifar - 100 and Imagenet - 100 datasets, demonstrating the performance improvement of the HEX method on various SSL algorithms. - The experimental results show that after using the HEX method, the performance is significantly improved compared to the baseline method, especially in fine - grained recognition tasks. ### Formula Presentation - **InfoNCE Loss**: \[ L_{\text{NCE}} = -\sum_{x_i \in I} \log \left( \frac{\exp(z_i \cdot z_j(i) / \tau)}{\sum_{a \in A(i)} \exp(z_i \cdot z_a / \tau)} \right) \] - **HEX Loss**: \[ L_{\text{HEX}} = -\sum_{x_i \in I} \log \left( \frac{\exp(z_i \cdot z_j(i) / \tau)}{Q_{hi} \left( \sum_{h \in H(i)} \exp(z_i \cdot z_h / \tau) \right) + \sum_{n \notin H(i)} \exp(z_i \cdot z_n / \tau)} \right) \] where \(Q_{hi}\) is a weight function, defined as: \[ Q_{hi} = \frac{\sum_{h \in H(i)} \exp(z_i \cdot z_h / \tau) (z_i \cdot z_h / \tau)}{N \sum_{h \in H(i)} \exp(z_i

HEX: Hierarchical Emergence Exploitation in Self-Supervised Algorithms

Neural collapse inspired semi-supervised learning with fixed classifier

On Improving the Algorithm-, Model-, and Data- Efficiency of Self-Supervised Learning

Learning Where to Learn in Cross-View Self-Supervised Learning

Addressing Sample Inefficiency in Multi-View Representation Learning

Mind Your Augmentation: the Key to Decoupling Dense Self-Supervised Learning

De-coupling and De-positioning Dense Self-supervised Learning

Preventing Dimensional Collapse in Self-Supervised Learning via Orthogonality Regularization

On the Discriminability of Self-Supervised Representation Learning

CRLSTM-HEXNET: Hybrid Deep Learning Framework with Harris Hawk Optimization in Multi-Label Classification

The Common Stability Mechanism behind most Self-Supervised Learning Approaches

More Synergy, Less Redundancy: Exploiting Joint Mutual Information for Self-Supervised Learning

Can We Break Free from Strong Data Augmentations in Self-Supervised Learning?

Augmentations vs Algorithms: What Works in Self-Supervised Learning

EMP-SSL: Towards Self-Supervised Learning in One Training Epoch

Self-Supervised Anomaly Detection in the Wild: Favor Joint Embeddings Methods

Weak Augmentation Guided Relational Self-Supervised Learning

Semi Supervised Heterogeneous Domain Adaptation via Disentanglement and Pseudo-Labelling

Adaptive Hierarchical Certification for Segmentation using Randomized Smoothing

MSR: Making Self-supervised learning Robust to Aggressive Augmentations

ReSSL: Relational Self-Supervised Learning with Weak Augmentation.