Abstract:With the recent promising results of contrastive learning in the self-supervised learning paradigm, supervised contrastive learning has successfully extended these contrastive approaches to supervised contexts, outperforming cross-entropy on various datasets. However, supervised contrastive learning inherently employs label information in a binary form--either positive or negative--using a one-hot target vector. This structure struggles to adapt to methods that exploit label information as a probability distribution, such as CutMix and knowledge distillation. In this paper, we introduce a generalized supervised contrastive loss, which measures cross-entropy between label similarity and latent similarity. This concept enhances the capabilities of supervised contrastive loss by fully utilizing the label distribution and enabling the adaptation of various existing techniques for training modern neural networks. Leveraging this generalized supervised contrastive loss, we construct a tailored framework: the Generalized Supervised Contrastive Learning (GenSCL). Compared to existing contrastive learning frameworks, GenSCL incorporates additional enhancements, including advanced image-based regularization techniques and an arbitrary teacher classifier. When applied to ResNet50 with the Momentum Contrast technique, GenSCL achieves a top-1 accuracy of 77.3% on ImageNet, a 4.1% relative improvement over traditional supervised contrastive learning. Moreover, our method establishes new state-of-the-art accuracies of 98.2% and 87.0% on CIFAR10 and CIFAR100 respectively when applied to ResNet50, marking the highest reported figures for this architecture.

Semi-Supervised Contrastive Learning with Generalized Contrastive Loss and Its Application to Speaker Recognition

Contrastive Learning for improving End-to-end Speaker Verification

Generalized Supervised Contrastive Learning

Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning

Contrastive Speaker Representation Learning with Hard Negative Sampling for Speaker Recognition

Weighted Cluster-Range Loss and Criticality-Enhancement Loss for Speaker Recognition

Additive Margin in Contrastive Self-Supervised Frameworks to Learn Discriminative Speaker Representations

Asymmetric Clean Segments-Guided Self-Supervised Learning for Robust Speaker Verification

Speaker-Text Retrieval via Contrastive Learning

Contrastive Speaker Embedding With Sequential Disentanglement

Self-attention Based Speaker Recognition Using Cluster-Range Loss

Improving Speaker Representations Using Contrastive Losses on Multi-scale Features

Contrastive Learning and Inter-Speaker Distribution Alignment Based Unsupervised Domain Adaptation for Robust Speaker Verification

Learning Discriminative Speaker Embedding by Improving Aggregation Strategy and Loss Function for Speaker Verification

Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning

Local-Global Contrast for Learning Voice-Face Representations

Experimenting with Additive Margins for Contrastive Self-Supervised Speaker Verification

SCLAV: Supervised Cross-modal Contrastive Learning for Audio-Visual Coding

Grouped Contrastive Learning of Self-Supervised Sentence Representation

A Comparison of Metric Learning Loss Functions for End-To-End Speaker Verification