Abstract:Several self-supervised learning (SSL) approaches have shown that redundancy reduction in the feature embedding space is an effective tool for representation learning. However, these methods consider a narrow notion of redundancy, focusing on pairwise correlations between features. To address this limitation, we formalize the notion of embedding space redundancy and introduce redundancy measures that capture more complex, higher-order dependencies. We mathematically analyze the relationships between these metrics, and empirically measure these redundancies in the embedding spaces of common SSL methods. Based on our findings, we propose Self Supervised Learning with Predictability Minimization (SSLPM) as a method for reducing redundancy in the embedding space. SSLPM combines an encoder network with a predictor engaging in a competitive game of reducing and exploiting dependencies respectively. We demonstrate that SSLPM is competitive with state-of-the-art methods and find that the best performing SSL methods exhibit low embedding space redundancy, suggesting that even methods without explicit redundancy reduction mechanisms perform redundancy reduction implicitly.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the feature embedding space redundancy problem in self - supervised learning (SSL). Specifically, existing SSL methods mainly focus on pairwise correlations between features to reduce redundancy, but these methods ignore higher - order redundancy (complex dependencies involving more than two features) and nonlinear redundancy. To address this limitation, the authors pose the following research questions: 1. **How to quantify the redundancy in the SSL embedding space?** 2. **How does the redundancy in the embedding space affect the performance of downstream tasks?** 3. **Can performance be further improved by eliminating more complex redundancies?** To answer these questions, the authors carried out the following work: 1. **Introduced a formal definition of embedding space redundancy**: including pairwise redundancy, linear redundancy, and nonlinear redundancy, and derived the theoretical relationships between these metrics. 2. **Proposed a new SSL method based on predictability minimization (SSLPM)**: to reduce the redundancy in the embedding space through a competitive game between the encoder and the predictor. 3. **Empirically analyzed the relationship between the model performance of multiple SSL methods and the embedding space redundancy**: including Barlow Twins, BYOL, NNCLR, SimCLR, MocoV3, VICReg, and VIbCReg, etc. Through this work, the authors hope to reveal the role of redundancy in SSL and explore whether the performance of SSL methods can be improved through more complex redundancy reduction mechanisms. ### Main Findings - **Reducing additional redundancy does not necessarily lead to higher downstream performance**: Experimental results show that reducing additional redundancy during the training process does not significantly improve the performance of downstream tasks. - **Methods of explicit redundancy reduction show an obvious connection between performance and linear redundancy**: For example, Barlow Twins and SSLPM, but this connection is not universal. - **Even without an explicit redundancy reduction mechanism, the best - performing SSL methods also show low embedding - space redundancy**: This indicates that some methods may implicitly reduce redundancy. - **The depth of the projector has a significant impact on redundancy reduction**: More projector layers can reduce the linear and nonlinear redundancies in the embedding space. In summary, this paper aims to deeply explore the importance of redundancy reduction in SSL and its impact on model performance by introducing new redundancy metrics and methods.

Beyond Pairwise Correlations: Higher-Order Redundancies in Self-Supervised Representation Learning

On Feature Decorrelation in Self-Supervised Learning

More Synergy, Less Redundancy: Exploiting Joint Mutual Information for Self-Supervised Learning

Low-Rank Approximation of Structural Redundancy for Self-Supervised Learning

ReSSL: Relational Self-Supervised Learning with Weak Augmentation.

Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction

Weak Augmentation Guided Relational Self-Supervised Learning

Learning Where to Learn in Cross-View Self-Supervised Learning

On the Discriminability of Self-Supervised Representation Learning

Learning Disentangled Representation with Pairwise Independence

Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods

Mitigating Spurious Correlations for Self-supervised Recommendation

Rethinking Self-Supervised Learning Within the Framework of Partial Information Decomposition

Redundant Correlation Effect on Personalized Recommendation

Making Self-supervised Learning Robust to Spurious Correlation via Learning-speed Aware Sampling

Triplet is All You Need with Random Mappings for Unsupervised Visual Representation Learning

A Probabilistic Model Behind Self-Supervised Learning

An Empirical Study of Self-Supervised Learning with Wasserstein Distance

Rethinking Self-Supervised Learning: Small is Beautiful

Understanding the Role of Equivariance in Self-supervised Learning