Multimodal variational contrastive learning for few-shot classification

Meihong Pan,Hongbin Shen
DOI: https://doi.org/10.1007/s10489-024-05269-5
IF: 5.3
2024-01-25
Applied Intelligence
Abstract:The effectiveness of metric-based few-shot learning methods heavily relies on the discriminative ability of the prototypes and feature embeddings of queries. However, using instance-level unimodal prototypes often falls short in capturing the essence of various categories. To this end, we propose a multimodal variational contrastive learning framework that aims to enhance prototype representativeness and refine the discrimination of query features by acquiring distribution-level representations. Our approach starts by training a variational auto-encoder through supervised contrastive learning in both the visual and semantic spaces. The trained model is employed to augment the support set by sampling features from the learned semantic distributions and generate pseudo-semantics for queries to achieve information balance across samples in both the support and query sets. Furthermore, we establish a multimodal instance-to-distribution model that learns to transform instance-level multimodal features into distribution-level representations via variational inference, facilitating robust metric. Experiments show that our MVC consistently brings between 0.5 and 7 improvement in accuracy over state-of-the-art methods on standard few-shot learning datasets like miniImageNet, CIFAR-FS, tieredImageNet, and CUB, demonstrating the superiority of our method in terms of classification performance and robustness. The source code is available at: https://github.com/pmhDL/MVC.git.
computer science, artificial intelligence
What problem does this paper attempt to address?