Towards the Generalization of Contrastive Self-Supervised Learning

Weiran Huang,Mingyang Yi,Xuyang Zhao,Zihao Jiang

DOI: https://doi.org/10.48550/arXiv.2111.00743

2023-03-02

Abstract:Recently, self-supervised learning has attracted great attention, since it only requires unlabeled data for model training. Contrastive learning is one popular method for self-supervised learning and has achieved promising empirical performance. However, the theoretical understanding of its generalization ability is still limited. To this end, we define a kind of $(\sigma,\delta)$-measure to mathematically quantify the data augmentation, and then provide an upper bound of the downstream classification error rate based on the measure. It reveals that the generalization ability of contrastive self-supervised learning is related to three key factors: alignment of positive samples, divergence of class centers, and concentration of augmented data. The first two factors are properties of learned representations, while the third one is determined by pre-defined data augmentation. We further investigate two canonical contrastive losses, InfoNCE and cross-correlation, to show how they provably achieve the first two factors. Moreover, we conduct experiments to study the third factor, and observe a strong correlation between downstream performance and the concentration of augmented data.

Machine Learning,Artificial Intelligence,Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

This paper attempts to address the problem of insufficient theoretical understanding of the generalization ability of Contrastive Self - Supervised Learning (CSSL) in downstream tasks. Specifically, although CSSL has achieved remarkable empirical performance in fields such as computer vision and natural language processing, the theoretical basis for its generalization ability remains incomplete. The authors quantify the effect of data augmentation by defining a new data augmentation metric - (σ,δ)-augmentation, and based on this metric, provide an upper bound on the downstream classification error rate. This reveals that the generalization ability of contrastive self - supervised learning is related to three key factors: the alignment of positive samples, the divergence of class centers, and the concentration of augmented data. The first two factors are properties of the learned representation, while the third factor is determined by the predefined data augmentation. The main contributions of the paper include: 1. Proposing a new (σ,δ)-metric method to quantify data augmentation. 2. Establishing a theoretical framework that emphasizes alignment, divergence, and concentration as key factors in the generalization ability of contrastive self - supervised learning. 3. Demonstrating that not only the InfoNCE loss, but also the cross - correlation loss satisfies the alignment and divergence conditions. 4. Experiments show that the downstream performance is highly correlated with the concentration of augmented data. Through these works, the paper provides a new perspective and theoretical support for understanding the generalization ability of contrastive self - supervised learning.

Towards the Generalization of Contrastive Self-Supervised Learning

Boosting Unsupervised Contrastive Learning Using Diffusion-Based Data Augmentation from Scratch

Towards the Out-of-Distribution Generalization of Contrastive Self-Supervised Learning

Feature Augmentation for Self-supervised Contrastive Learning: A Closer Look

Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning

Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss

A Survey on Contrastive Self-Supervised Learning

On the duality between contrastive and non-contrastive self-supervised learning

A Unified Contrastive Loss for Self-Training

Generalized Supervised Contrastive Learning

Is Self-Supervised Learning More Robust Than Supervised Learning?

Your Contrastive Learning Is Secretly Doing Stochastic Neighbor Embedding

Investigating Contrastive Pair Learning's Frontiers in Supervised, Semisupervised, and Self-Supervised Learning

Chaos is a Ladder: A New Theoretical Understanding of Contrastive Learning Via Augmentation Overlap

What Should Not Be Contrastive in Contrastive Learning

The Power of Contrast for Feature Learning: A Theoretical Analysis

Contrastive Learning With Stronger Augmentations

Data-Efficient Contrastive Self-supervised Learning: Most Beneficial Examples for Supervised Learning Contribute the Least

Siamese Prototypical Contrastive Learning

A Generalization Theory of Cross-Modality Distillation with Contrastive Learning

Self-Damaging Contrastive Learning