Implicit variance regularization in non-contrastive SSL

Manu Srinath Halvagal,Axel Laborieux,Friedemann Zenke
2023-10-27
Abstract:Non-contrastive SSL methods like BYOL and SimSiam rely on asymmetric predictor networks to avoid representational collapse without negative samples. Yet, how predictor networks facilitate stable learning is not fully understood. While previous theoretical analyses assumed Euclidean losses, most practical implementations rely on cosine similarity. To gain further theoretical insight into non-contrastive SSL, we analytically study learning dynamics in conjunction with Euclidean and cosine similarity in the eigenspace of closed-form linear predictor networks. We show that both avoid collapse through implicit variance regularization albeit through different dynamical mechanisms. Moreover, we find that the eigenvalues act as effective learning rate multipliers and propose a family of isotropic loss functions (IsoLoss) that equalize convergence rates across eigenmodes. Empirically, IsoLoss speeds up the initial learning dynamics and increases robustness, thereby allowing us to dispense with the EMA target network typically used with non-contrastive methods. Our analysis sheds light on the variance regularization mechanisms of non-contrastive SSL and lays the theoretical grounds for crafting novel loss functions that shape the learning dynamics of the predictor's spectrum.
Machine Learning,Artificial Intelligence,Neural and Evolutionary Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the representational collapse problem in non - contrastive self - supervised learning (SSL) methods. Specifically, the paper explores how to avoid the representational collapse phenomenon in non - contrastive SSL methods (such as BYOL and SimSiam) without using negative samples through implicit variance regularization. The paper focuses on the learning dynamics under Euclidean loss and cosine similarity, and proposes a new family of loss functions - isotropic loss functions (IsoLoss) to accelerate the initial learning dynamics and improve robustness, so that non - contrastive methods can work without an exponential moving average (EMA) target network. ### Main contributions of the paper: 1. **Analysis of SSL dynamics**: In the feature space of a closed - form linear predictor, the paper analyzes the SSL dynamics under asymmetric Euclidean loss and cosine loss, and shows that both losses achieve implicit variance regularization through different dynamic mechanisms. 2. **Eigenvalues as learning rate multipliers**: The paper finds that the eigenvalues of the predictor act as learning rate multipliers, which leads to a slower learning speed for small - eigenvalue patterns. 3. **Proposing isotropic loss functions**: Based on the above analysis, the paper designs a new isotropic loss function (IsoLoss), which can balance the learning dynamics of different feature patterns, improve robustness, and allow learning without an EMA target network. ### Key technical details: - **Feature space analysis**: Using the neural tangent kernel (NTK) theory, the paper derives the dynamic expressions of Euclidean loss and cosine loss in the feature space of the predictor, revealing the mechanism of implicit variance regularization. - **Isotropic loss function**: By removing the influence of eigenvalues as learning rate multipliers, the isotropic loss function is designed, making the learning dynamics of all feature patterns more uniform. ### Experimental verification: - **Linear network experiments**: Experiments were carried out on a simple linear Siamese network to verify the correctness of the theoretical analysis. - **Non - linear network experiments**: Experiments were carried out on real - world datasets such as CIFAR - 10, CIFAR - 100, STL - 10 and TinyImageNet using a ResNet - 18 backbone network. The results show that the isotropic loss function can accelerate the initial learning dynamics, improve model performance, and can work stably without an EMA target network. ### Conclusion: Through theoretical analysis and experiments, the paper proves that the isotropic loss function (IsoLoss) can effectively solve the representational collapse problem in non - contrastive self - supervised learning, accelerate the learning dynamics, and improve the robustness of the model. This finding provides new ideas for designing more effective non - contrastive self - supervised learning methods.