Understanding the Role of Equivariance in Self-supervised Learning

Yifei Wang,Kaiwen Hu,Sharut Gupta,Ziyu Ye,Yisen Wang,Stefanie Jegelka
2024-11-11
Abstract:Contrastive learning has been a leading paradigm for self-supervised learning, but it is widely observed that it comes at the price of sacrificing useful features (\eg colors) by being invariant to data augmentations. Given this limitation, there has been a surge of interest in equivariant self-supervised learning (E-SSL) that learns features to be augmentation-aware. However, even for the simplest rotation prediction method, there is a lack of rigorous understanding of why, when, and how E-SSL learns useful features for downstream tasks. To bridge this gap between practice and theory, we establish an information-theoretic perspective to understand the generalization ability of E-SSL. In particular, we identify a critical explaining-away effect in E-SSL that creates a synergy between the equivariant and classification tasks. This synergy effect encourages models to extract class-relevant features to improve its equivariant prediction, which, in turn, benefits downstream tasks requiring semantic features. Based on this perspective, we theoretically analyze the influence of data transformations and reveal several principles for practical designs of E-SSL. Our theory not only aligns well with existing E-SSL methods but also sheds light on new directions by exploring the benefits of model equivariance. We believe that a theoretically grounded understanding on the role of equivariance would inspire more principled and advanced designs in this field. Code is available at <a class="link-external link-https" href="https://github.com/kaotty/Understanding-ESSL" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence,Computer Vision and Pattern Recognition,Information Theory
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of insufficient understanding and theoretical basis in equivariant representation learning in self - supervised learning (SSL). Specifically: 1. **Limitations of invariant self - supervised learning**: Existing contrastive learning methods, as the main paradigm of self - supervised learning, create positive and negative sample pairs through data augmentation to learn representations that are invariant to input transformations. However, this method often sacrifices some useful information (such as color information), thus affecting the performance of downstream tasks. 2. **Insufficient theoretical understanding of equivariant self - supervised learning (E - SSL)**: Although equivariant self - supervised learning (E - SSL) performs well in practice, its theoretical basis is not yet perfect. In particular, for the simplest rotation prediction method, there is currently a lack of a strict theoretical explanation of why, when, and how E - SSL can learn useful features for downstream tasks. To solve these problems, the paper proposes the following key points: - **Information - theoretic perspective**: The authors establish a theoretical framework for E - SSL from an information - theoretic perspective to understand its generalization ability. They identify a key effect in E - SSL - the "explaining - away effect", which creates a synergy between equivariant tasks and classification tasks in E - SSL. - **Synergistic effect**: This synergistic effect encourages the model to extract category - related features to improve equivariant prediction, which in turn is helpful for downstream tasks that require semantic features. - **Design principles**: Based on this theoretical framework, the authors reveal several practical design principles, including: - **Lossy transformations**: Select transformations for which the transformation parameters cannot be fully inferred. - **Category relevance**: Ensure that extracting category information can effectively improve equivariant prediction. - **Shortcut pruning**: Avoid style features from becoming shortcuts for equivariant prediction, thereby ensuring that category - related semantic features are learned. Through these theoretical analyses and experimental proofs, the paper not only fills the gap in the theoretical understanding of E - SSL but also provides valuable guidance for the future design of E - SSL. ### Formula presentation Some of the key formulas involved in the paper are as follows: - **Mutual information**: \[I(A;Z)=H(A)-H(A|Z)\] where \(H(A)\) is the entropy of \(A\) and \(H(A|Z)\) is the conditional entropy of \(A\) given \(Z\). - **Explaining - away effect**: \[I(A;C|Z)=H(A|Z)-H(A|Z,C)>0\] This shows that given the representation \(Z\), using category information \(C\) can better predict the transformation \(A\). - **Simplified data generation process**: \[X = A+\lambda C\] where \(X\) is the transformed input, \(A\) is the transformation variable, \(C\) is the category variable, and \(\lambda\) is the mixing coefficient. Through these formulas, the paper explains in detail the effectiveness and design principles of E - SSL.