Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE
Junya Chen,Zhe Gan,Xuan Li,Qing Guo,Liqun Chen,Shuyang Gao,Tagyoung Chung,Yi Xu,Belinda Zeng,Wenlian Lu,Fan Li,Lawrence Carin,Chenyang Tao
DOI: https://doi.org/10.48550/arXiv.2107.01152
2021-07-02
Abstract:InfoNCE-based contrastive representation learners, such as SimCLR, have been tremendously successful in recent years. However, these contrastive schemes are notoriously resource demanding, as their effectiveness breaks down with small-batch training (i.e., the log-K curse, whereas K is the batch-size). In this work, we reveal mathematically why contrastive learners fail in the small-batch-size regime, and present a novel simple, non-trivial contrastive objective named FlatNCE, which fixes this issue. Unlike InfoNCE, our FlatNCE no longer explicitly appeals to a discriminative classification goal for contrastive learning. Theoretically, we show FlatNCE is the mathematical dual formulation of InfoNCE, thus bridging the classical literature on energy modeling; and empirically, we demonstrate that, with minimal modification of code, FlatNCE enables immediate performance boost independent of the subject-matter engineering efforts. The significance of this work is furthered by the powerful generalization of contrastive learning techniques, and the introduction of new tools to monitor and diagnose contrastive training. We substantiate our claims with empirical evidence on CIFAR10, ImageNet, and other datasets, where FlatNCE consistently outperforms InfoNCE.
Machine Learning,Artificial Intelligence,Computer Vision and Pattern Recognition,Information Theory