Cross-Modal Meta-Knowledge Transfer: A Meta-Learning Framework Adaptable for Multimodal Tasks

Yuhe Chen,Jingxuan Jin,De Li,Peng Wang
DOI: https://doi.org/10.1145/3675249.3675347
2024-01-01
Abstract:Due to the significant disparities among different modal data, multimodal few-shot learning has always been a challenging issue in the field of artificial intelligence. Compared to traditional machine learning, meta-learning, as a more data-efficient training framework, its application in multimodal few-shot tasks has not yet been thoroughly investigated. For this reason, this paper proposes a novel two-stage multimodal meta-learning framework. Specifically, we first define the construction method of multimodal meta-tasks, decomposing the model's training into a series of multimodal meta-task collections. This framework actively learns the complementary information of different modalities in a phased manner. Secondly, by acquiring additional textual information as training samples through a language generation model and combining it with images, the multimodal semantic features extracted by the model are enriched. Lastly, we evaluated our method using few-shot classification tasks. Experimental results indicate that our proposed training framework surpasses other methods in recent years across multiple datasets.
What problem does this paper attempt to address?