Cross-Modal Knowledge Distillation For Fine-Grained One-Shot Classification

Jiabao Zhao,Xin Lin,Yifan Yang,Jing Yang,Liang He
DOI: https://doi.org/10.1109/ICASSP39728.2021.9414480
2021-06-06
Abstract:Few-shot learning can recognize a novel category based on only a few samples because it learns to learn from a lot of labeled samples during the training process. When data is insufficient, the performance is affected. And it is expensive to obtain a large-scale finegrained dataset with annotation. In this paper, we adopt domain- specific knowledge to fill the gap of insufficient annotated data. We propose a cross-modal knowledge distillation (CMKD) framework to do fine-grained one-shot classification and propose the Spatial Relation Loss (SRL) to transfer cross-modal information, which can tackle the semantic gap between multimodal features. The teacher network distills the spatial relationship of the samples as a soft target for training a unimodal student network. Notably, the student network makes predictions only based on a few samples without any external knowledge in the application. This model-agnostic framework will be well adapted to other few-shot models. Extensive experimental results on benchmarks demonstrate that CMKD can make full use of cross-modal knowledge in image and text few-shot classification. CKMD improves the performances of the student networks significantly, even if it is a state-of-the-art student network.
Computer Science
What problem does this paper attempt to address?