Diversifying Cross-Domain Few-Shot Learning Via Multimodal Image Editing

Zhipeng Lin,Wenjing Yang,Long Lan,Mingyang Geng,Haotian Wang,Haoang Chi,Xueqiong Li,Ji Wang
DOI: https://doi.org/10.1109/icassp48485.2024.10447785
2024-01-01
Abstract:Standing out as one of the most widely used tools in Cross-Domain Few-Shot Learning (CDFSL), data augmentation forms the bedrock of numerous recent advancements. However, the current augmentations in CDFSL are limited in their ability to modify high-level semantic attributes, resulting in a lack of diversity along key semantic dimensions. One of the most promising tools to edit images with key semantic attributes, e.g. backgrounds, is image-to-image generation via large multimodal models (LMMs). Given the promising image editing results of recent LMMs, we delve into leveraging LMMs to augment data diversity for CDFSL. We propose a novel method named, Multimodal Few-shot Image Editing (MFIE), which uses LMMs to automatically translate class-specific images into class-agnostic natural language descriptions for various key semantic attributes in target domains and editing origin images based on class-agnostic natural language descriptions. To filter out corrupted data that disturbs the class-specific information, we apply semantic filtering using image-language similarity. Experiments on Meta-Datset show that MFIE surpasses SOTA CDFSL algorithms.
What problem does this paper attempt to address?