Multi-Level Correlation Network For Few-Shot Image Classification

Yunkai Dang,Min Zhang,Zhengyu Chen,Xinliang Zhang,Zheng Wang,Meijun Sun,Donglin Wang
DOI: https://doi.org/10.1109/ICME55011.2023.00494
2024-12-04
Abstract:Few-shot image classification(FSIC) aims to recognize novel classes given few labeled images from base classes. Recent works have achieved promising classification performance, especially for metric-learning methods, where a measure at only image feature level is usually used. In this paper, we argue that measure at such a level may not be effective enough to generalize from base to novel classes when using only a few images. Instead, a multi-level descriptor of an image is taken for consideration in this paper. We propose a multi-level correlation network (MLCN) for FSIC to tackle this problem by effectively capturing local information. Concretely, we present the self-correlation module and cross-correlation module to learn the semantic correspondence relation of local information based on learned representations. Moreover, we propose a pattern-correlation module to capture the pattern of fine-grained images and find relevant structural patterns between base classes and novel classes. Extensive experiments and analysis show the effectiveness of our proposed method on four widely-used FSIC benchmarks. The code for our approach is available at: <a class="link-external link-https" href="https://github.com/Yunkai696/MLCN" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is **the insufficient ability of the model to generalize from base classes to new classes in few - shot image classification (FSIC)**, especially when only a small number of images are used. Specifically, existing metric learning methods usually only perform measurement at the image feature level, which may lead to a decline in the performance of the model when there are distribution differences between the base classes and the new classes. ### Problem Background Few - shot image classification aims to identify new classes by providing only a small number of labeled images. Compared with traditional image classification tasks, the biggest challenge of FSIC lies in the inconsistent label spaces of the base classes and the new classes, that is, the labels of the new classes have not been seen in the base classes. This makes it difficult for the model to effectively generalize from the base classes to the new classes. ### Core Problem of the Paper The paper points out that existing methods usually only use global information (such as the overall features of the image) in metric learning, while ignoring local information (such as the foreground). When the background information between the base classes and the new classes is different or there is a distribution shift, this global information may lead to a decline in the generalization ability of the model. Therefore, the paper proposes a multi - level correlation network (MLCN) to effectively capture local information, thereby improving the generalization ability of the model in few - shot image classification. ### Solution To solve the above problems, the paper proposes the following innovations: 1. **Self - Correlation Module**: It is used to learn the discriminative object regions in the query and support sets. 2. **Cross - Correlation Module**: It is used to learn the semantic correspondence between the query and the support set. 3. **Pattern - Correlation Module**: It is used to capture the relevant structural patterns of fine - grained images and find the relevant structures between the base classes and the new classes. By combining these three modules, MLCN can more effectively capture local information, thereby improving the generalization ability of the model. ### Experimental Results The experimental results show that MLCN has achieved significant performance improvements on four widely - used FSIC benchmark datasets (miniImageNet, tiered ImageNet, CUB - 200 - 2011 and CIFAR - FS), verifying its effectiveness. ### Summary The main contributions of the paper are: - Verifying that removing background information can significantly improve the performance of the model. - Proposing the self - correlation module and the cross - correlation module to learn the semantic correspondence of local information. - Proposing the pattern - correlation module to capture the relevant structural patterns of fine - grained images. These innovations make MLCN perform excellently in the few - shot image classification task and improve the generalization ability of the model.