Abstract:In recent studies on few-shot classification, most of the existing methods utilized word embeddings as prior knowledge to adjust the distribution of visual prototypes. However, this straightforward fusion of visual and semantic features profoundly alters the feature distribution in the original feature space, rendering it unable to effectively calibrate feature distribution through mutual guidance of cross-modal information. To address this problem, we propose a novel Bigraph Mutual Prototype Calibration Network (BMPCN) for few-shot learning in this paper, in which we not only update the distribution of class features based on prototype-level similarity in both visual and semantic spaces but also facilitate the mutual guidance of visual and semantic feature updates through instance-level similarity. In the BMPCN, a bigraph mutual promotion structure is proposed, wherein a visual graph is constructed with visual features as nodes and the similarity between visual features as edges. Simultaneously, the semantic feature nodes are automatically generated from images, and the class-level prior knowledge is leveraged to correct these automatically generated semantic nodes. To better update the bigraph mutual promotion structure, we propose a Bigraph Interactive Augmentation Module (BIAM), a Nearest Neighbor Proto-level Similarity Promotion Module (NN-PSP), and a Proto-level Similarity Promotion Module (PK-PSP) based on original knowledge augmentation to perform the bigraph update. For inter-graph updating, we use the prototype-level similarity obtained from the NN-PSP and PK-PSP modules to fully learn task-level information, thus enabling task-specific prototype updates. For intra-graph updating, our visual and semantic graphs use instance-level similarity analysis to extract potential correlations between different feature domains and implement mutual guidance in the BIAM module to correct the feature distribution of visual and semantic features. Experiments on three widely used benchmarks illustrated that our proposed method obtains excellent performance based on the backbone Conv-4, and the results outperform state-of-the-art methods by about 8% on miniImageNet, tieredImageNet, and CUB-200-2011. Code has been available at https://github.com/cmzHome/BMPCN-MASTER.

Better Integrating Vision and Semantics for Improving Few-shot Classification

FewVS: A Vision-Semantics Integration Framework for Few-Shot Image Classification

Bimodal semantic fusion prototypical network for few-shot classification

Multimodal variational contrastive learning for few-shot classification

FILM: How can Few-Shot Image Classification Benefit from Pre-Trained Language Models?

Binocular Mutual Learning for Improving Few-shot Classification

VSA: Adaptive Visual and Semantic Guided Attention on Few-Shot Learning

Improving the Generalised Few-shot Learning by Semantic Information

Adaptive Cross-Modal Few-Shot Learning

Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models

Bidirectional Matching Prototypical Network for Few-Shot Image Classification

Semantic-Based Few-Shot Learning by Interactive Psychometric Testing

Cross-Modal Mapping: Eliminating the Modality Gap for Few-Shot Image Classification

Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation

BMPCN: A Bigraph Mutual Prototype Calibration Net for Few-Shot Classification

Semantic-Aligned Attention with Refining Feature Embedding for Few-Shot Image Classification

Semantic-Based Few-Shot Classification by Psychometric Learning

Transductive Semantic Decoupling Double Variational Inference for Few-Shot Classification

Enhanced Visual Categorization Performances by Incorporation of Simple Features into Bim Features

Multimodal few-shot classification without attribute embedding

Distilling base-and-meta network with contrastive learning for few-shot semantic segmentation