Abstract:Non-contrastive SSL methods like BYOL and SimSiam rely on asymmetric predictor networks to avoid representational collapse without negative samples. Yet, how predictor networks facilitate stable learning is not fully understood. While previous theoretical analyses assumed Euclidean losses, most practical implementations rely on cosine similarity. To gain further theoretical insight into non-contrastive SSL, we analytically study learning dynamics in conjunction with Euclidean and cosine similarity in the eigenspace of closed-form linear predictor networks. We show that both avoid collapse through implicit variance regularization albeit through different dynamical mechanisms. Moreover, we find that the eigenvalues act as effective learning rate multipliers and propose a family of isotropic loss functions (IsoLoss) that equalize convergence rates across eigenmodes. Empirically, IsoLoss speeds up the initial learning dynamics and increases robustness, thereby allowing us to dispense with the EMA target network typically used with non-contrastive methods. Our analysis sheds light on the variance regularization mechanisms of non-contrastive SSL and lays the theoretical grounds for crafting novel loss functions that shape the learning dynamics of the predictor's spectrum.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the representational collapse problem in non - contrastive self - supervised learning (SSL) methods. Specifically, the paper explores how to avoid the representational collapse phenomenon in non - contrastive SSL methods (such as BYOL and SimSiam) without using negative samples through implicit variance regularization. The paper focuses on the learning dynamics under Euclidean loss and cosine similarity, and proposes a new family of loss functions - isotropic loss functions (IsoLoss) to accelerate the initial learning dynamics and improve robustness, so that non - contrastive methods can work without an exponential moving average (EMA) target network. ### Main contributions of the paper: 1. **Analysis of SSL dynamics**: In the feature space of a closed - form linear predictor, the paper analyzes the SSL dynamics under asymmetric Euclidean loss and cosine loss, and shows that both losses achieve implicit variance regularization through different dynamic mechanisms. 2. **Eigenvalues as learning rate multipliers**: The paper finds that the eigenvalues of the predictor act as learning rate multipliers, which leads to a slower learning speed for small - eigenvalue patterns. 3. **Proposing isotropic loss functions**: Based on the above analysis, the paper designs a new isotropic loss function (IsoLoss), which can balance the learning dynamics of different feature patterns, improve robustness, and allow learning without an EMA target network. ### Key technical details: - **Feature space analysis**: Using the neural tangent kernel (NTK) theory, the paper derives the dynamic expressions of Euclidean loss and cosine loss in the feature space of the predictor, revealing the mechanism of implicit variance regularization. - **Isotropic loss function**: By removing the influence of eigenvalues as learning rate multipliers, the isotropic loss function is designed, making the learning dynamics of all feature patterns more uniform. ### Experimental verification: - **Linear network experiments**: Experiments were carried out on a simple linear Siamese network to verify the correctness of the theoretical analysis. - **Non - linear network experiments**: Experiments were carried out on real - world datasets such as CIFAR - 10, CIFAR - 100, STL - 10 and TinyImageNet using a ResNet - 18 backbone network. The results show that the isotropic loss function can accelerate the initial learning dynamics, improve model performance, and can work stably without an EMA target network. ### Conclusion: Through theoretical analysis and experiments, the paper proves that the isotropic loss function (IsoLoss) can effectively solve the representational collapse problem in non - contrastive self - supervised learning, accelerate the learning dynamics, and improve the robustness of the model. This finding provides new ideas for designing more effective non - contrastive self - supervised learning methods.

Implicit variance regularization in non-contrastive SSL

Understanding self-supervised Learning Dynamics without Contrastive Pairs

The Common Stability Mechanism behind most Self-Supervised Learning Approaches

The Edge of Orthogonality: A Simple View of What Makes BYOL Tick

Understanding the Role of Equivariance in Self-supervised Learning

Preventing Dimensional Collapse in Self-Supervised Learning via Orthogonality Regularization

Contrastive Learning with Synthetic Positives

An Information-Theoretic Perspective on Variance-Invariance-Covariance Regularization

Preventing Collapse in Contrastive Learning with Orthonormal Prototypes (CLOP)

FroSSL: Frobenius Norm Minimization for Efficient Multiview Self-Supervised Learning

Learning Weakly Convex Regularizers for Convergent Image-Reconstruction Algorithms

The Hidden Pitfalls of the Cosine Similarity Loss

Towards a Unified Theoretical Understanding of Non-contrastive Learning via Rank Differential Mechanism

On the duality between contrastive and non-contrastive self-supervised learning

Implicit Regularization in ReLU Networks with the Square Loss

GBVSSL: Contrastive Semi-Supervised Learning Based on Generalized Bias-Variance Decomposition

SigCLR: Sigmoid Contrastive Learning of Visual Representations

Regularizing with Pseudo-Negatives for Continual Self-Supervised Learning

Learning Continually by Spectral Regularization

Semi-Supervised Empirical Risk Minimization: Using unlabeled data to improve prediction