Towards the Generalization of Contrastive Self-Supervised Learning

Weiran Huang,Mingyang Yi,Xuyang Zhao,Zihao Jiang
DOI: https://doi.org/10.48550/arXiv.2111.00743
2023-03-02
Abstract:Recently, self-supervised learning has attracted great attention, since it only requires unlabeled data for model training. Contrastive learning is one popular method for self-supervised learning and has achieved promising empirical performance. However, the theoretical understanding of its generalization ability is still limited. To this end, we define a kind of $(\sigma,\delta)$-measure to mathematically quantify the data augmentation, and then provide an upper bound of the downstream classification error rate based on the measure. It reveals that the generalization ability of contrastive self-supervised learning is related to three key factors: alignment of positive samples, divergence of class centers, and concentration of augmented data. The first two factors are properties of learned representations, while the third one is determined by pre-defined data augmentation. We further investigate two canonical contrastive losses, InfoNCE and cross-correlation, to show how they provably achieve the first two factors. Moreover, we conduct experiments to study the third factor, and observe a strong correlation between downstream performance and the concentration of augmented data.
Machine Learning,Artificial Intelligence,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to address the problem of insufficient theoretical understanding of the generalization ability of Contrastive Self - Supervised Learning (CSSL) in downstream tasks. Specifically, although CSSL has achieved remarkable empirical performance in fields such as computer vision and natural language processing, the theoretical basis for its generalization ability remains incomplete. The authors quantify the effect of data augmentation by defining a new data augmentation metric - (σ,δ)-augmentation, and based on this metric, provide an upper bound on the downstream classification error rate. This reveals that the generalization ability of contrastive self - supervised learning is related to three key factors: the alignment of positive samples, the divergence of class centers, and the concentration of augmented data. The first two factors are properties of the learned representation, while the third factor is determined by the predefined data augmentation. The main contributions of the paper include: 1. Proposing a new (σ,δ)-metric method to quantify data augmentation. 2. Establishing a theoretical framework that emphasizes alignment, divergence, and concentration as key factors in the generalization ability of contrastive self - supervised learning. 3. Demonstrating that not only the InfoNCE loss, but also the cross - correlation loss satisfies the alignment and divergence conditions. 4. Experiments show that the downstream performance is highly correlated with the concentration of augmented data. Through these works, the paper provides a new perspective and theoretical support for understanding the generalization ability of contrastive self - supervised learning.