Poly-View Contrastive Learning

Amitis Shidani,Devon Hjelm,Jason Ramapuram,Russ Webb,Eeshan Gunesh Dhekane,Dan Busbridge
2024-03-09
Abstract:Contrastive learning typically matches pairs of related views among a number of unrelated negative views. Views can be generated (e.g. by augmentations) or be observed. We investigate matching when there are more than two related views which we call poly-view tasks, and derive new representation learning objectives using information maximization and sufficient statistics. We show that with unlimited computation, one should maximize the number of related views, and with a fixed compute budget, it is beneficial to decrease the number of unique samples whilst increasing the number of views of those samples. In particular, poly-view contrastive models trained for 128 epochs with batch size 256 outperform SimCLR trained for 1024 epochs at batch size 4096 on ImageNet1k, challenging the belief that contrastive models require large batch sizes and many training epochs.
Artificial Intelligence,Machine Learning,Computer Vision and Pattern Recognition,Information Theory
What problem does this paper attempt to address?
This paper discusses the problem of poly-view contrastive learning, which extends the existing contrastive learning framework to leverage multiple relevant views of the same data instance. Traditional contrastive learning typically deals with pairwise matching views, while this paper proposes how to design representation learning tasks in the presence of multiple relevant views. The authors propose new representation learning objectives through methods of information maximization and sufficient statistics, which go beyond pairwise matching and consider all views. The main contributions of the paper are as follows: 1. Generalizing the information-theoretic foundation to multi-view tasks, which leads to a new family of representation learning algorithms. 2. Providing an alternative perspective for multi-view contrastive learning from the viewpoint of sufficient statistics, and introducing a new loss function. When the number of views is 2, this loss function reduces to the well-known SimCLR loss, thus providing a new interpretation for contrastive learning. 3. Experimental results show that in image representation learning, higher view multiplicity can create a new computational Pareto frontier, indicating that reducing the number of samples while increasing the number of views per sample is beneficial under limited computational budget. Specifically, using a multi-view contrastive learning model with 128 training epochs and a batch size of 256 outperforms SimCLR with 1024 training epochs and a batch size of 4096. The paper also investigates the impact of different numbers of views, indicating that increasing the number of views can improve the ratio of gradient signal to noise and enhance model performance, but it does not directly increase the lower bound of mutual information. The authors analyze and design new learning objectives using information gain, multi-view conditional independence, and lower bounds in information theory.