Abstract:Contrastive learning is a highly successful technique for learning representations of data from labeled tuples, specifying the distance relations within the tuple. We study the sample complexity of contrastive learning, i.e. the minimum number of labeled tuples sufficient for getting high generalization accuracy. We give tight bounds on the sample complexity in a variety of settings, focusing on arbitrary distance functions, both general $\ell_p$-distances, and tree metrics. Our main result is an (almost) optimal bound on the sample complexity of learning $\ell_p$-distances for integer $p$. For any $p \ge 1$ we show that $\tilde \Theta(\min(nd,n^2))$ labeled tuples are necessary and sufficient for learning $d$-dimensional representations of $n$-point datasets. Our results hold for an arbitrary distribution of the input samples and are based on giving the corresponding bounds on the Vapnik-Chervonenkis/Natarajan dimension of the associated problems. We further show that the theoretical bounds on sample complexity obtained via VC/Natarajan dimension can have strong predictive power for experimental results, in contrast with the folklore belief about a substantial gap between the statistical learning theory and the practice of deep learning.

What problem does this paper attempt to address?

This paper investigates the problem of sample complexity in contrastive learning, which is the minimum number of samples required to learn the distance relationship of labeled tuples that represent data. The research focuses primarily on the sample complexity of arbitrary distance functions in various settings, especially the analysis of Euclidean space (ℓp distance) and tree metrics. The main contribution of the paper is to provide nearly optimal bounds on the sample complexity of integer ℓp distances, indicating that the minimum number of samples required to learn an n-point dataset with a d-dimensional representation is bounded by the ratio of n to d for any p≥1. These results are applicable to any input sample distribution and are based on the Vapnik-Chervonenkis/Natarajan dimensions of related problems. The paper points out that despite recent attention to the theoretical foundations of contrastive learning, most work has approached this problem from other perspectives, such as loss function design and transfer learning. The paper emphasizes the importance of sample complexity in deep learning, as the cost of obtaining samples remains a major consideration even when class labels are available, since training cost is linearly correlated with sample quantity. In addition, for certain settings, sample complexity may directly correspond to annotation cost. The main results of the paper first address the case of k=1, then extend to general values of k. The theoretical results include upper and lower bounds on the sample complexity for arbitrary distances, Euclidean distances, cosine similarity, and tree metrics. The authors also demonstrate how these theoretical bounds align with experimental results, validating the predictive power of classical PAC learning theory in deep learning practice. In short, this paper addresses the question of how many samples are needed in contrastive learning to learn a good distance function and provides precise bounds on the sample complexity for different distance functions and dataset sizes.

Optimal Sample Complexity of Contrastive Learning

Tree Learning: Optimal Algorithms and Sample Complexity

Contrastive estimation reveals topic posterior information to linear models

$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs

The Sample Complexity of Dictionary Learning

On the Sample Complexity of Predictive Sparse Coding

Understanding Contrastive Learning via Distributionally Robust Optimization

CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective

The sample complexity of multi-distribution learning

Understanding Contrastive Learning via Gaussian Mixture Models

Understanding and Generalizing Contrastive Learning from the Inverse Optimal Transport Perspective.

Sample Complexity of Nonparametric Semi-Supervised Learning

Learning from weakly dependent data under Dobrushin's condition

Contrastive Learning and Abstract Concepts: The Case of Natural Numbers

Towards the Generalization of Contrastive Self-Supervised Learning

Sample Complexity Result for Multi-category Classifiers of Bounded Variation

Characterizing the Sample Complexity of Private Learners

Efficient block contrastive learning via parameter-free meta-node approximation

Fundamental computational limits of weak learnability in high-dimensional multi-index models

Fine-Grained Representation Learning via Multi-Level Contrastive Learning without Class Priors