Dual-stream Multi-Modal Graph Neural Network for Few-Shot Learning

Wenli Zhang,Luping Shi
DOI: https://doi.org/10.1109/mipr59079.2023.00026
2023-01-01
Abstract:Few-shot learning aims to rapidly recognize unseen targets using only a limited number of labeled samples, which is one of the core capabilities for humans. However, existing research primarily transforms the samples into the feature space of a single modality, neglecting the hidden features within other modalities. To address this problem, we develop a Dual-stream Multi-Modal Graph Neural Network (DMMG) that leverages additional information from multi-modality for few-shot learning. We convert the representation of text and images into each other's feature space in parallel streams. Then we compare instances of text and images in different vector spaces at the same time, exploiting the potentials of each modality for few-shot classification through skipping the bottleneck of information. This approach can be extended to a wide range of metric-based few-shot learning methods. The experiments on the miniImageNet dataset demonstrate that DMMG outperforms state-of-the-art few-shot learning methods, highlighting the effectiveness of our proposed approach.
What problem does this paper attempt to address?