Semantic Prompt for Few-Shot Image Recognition

Wentao Chen,Chenyang Si,Zhang Zhang,Liang Wang,Zilei Wang,Tieniu Tan
2023-03-25
Abstract:Few-shot learning is a challenging problem since only a few examples are provided to recognize a new class. Several recent studies exploit additional semantic information, e.g. text embeddings of class names, to address the issue of rare samples through combining semantic prototypes with visual prototypes. However, these methods still suffer from the spurious visual features learned from the rare support samples, resulting in limited benefits. In this paper, we propose a novel Semantic Prompt (SP) approach for few-shot learning. Instead of the naive exploitation of semantic information for remedying classifiers, we explore leveraging semantic information as prompts to tune the visual feature extraction network adaptively. Specifically, we design two complementary mechanisms to insert semantic prompts into the feature extractor: one is to enable the interaction between semantic prompts and patch embeddings along the spatial dimension via self-attention, another is to supplement visual features with the transformed semantic prompts along the channel dimension. By combining these two mechanisms, the feature extractor presents a better ability to attend to the class-specific features and obtains more generalized image representations with merely a few support samples. Through extensive experiments on four datasets, the proposed approach achieves promising results, improving the 1-shot learning accuracy by 3.67% on average.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily addresses the challenging issue in **Few-shot Image Recognition**. Specifically: 1. **Background and Problem**: - In Few-shot Learning, it is difficult to train an effective classifier due to the very limited annotated samples provided for each new category (usually only a few samples). - Some recent studies attempt to alleviate the problem of sample scarcity by leveraging additional semantic information (such as text embeddings of categories) to combine with visual prototypes. 2. **Limitations of Existing Methods**: - Although existing methods utilize text information to some extent, they are still affected by incorrect visual features learned from the few support samples, leading to limited performance improvement. - These methods often fail to fully utilize text information to improve the visual feature extraction network, resulting in poor performance when recognizing new categories. 3. **Proposed Method**: - The paper proposes a new "Semantic Prompt (SP)" method, which adaptively adjusts the visual feature extraction network by using the text information of categories as prompts. - Specifically, the paper designs two complementary mechanisms to inject semantic prompts into the feature extractor: one mechanism interacts with patch embeddings in the spatial dimension through a self-attention mechanism; the other supplements visual features in the channel dimension. - By combining these two mechanisms, the feature extractor can better capture category-specific features and obtain more general image representations with only a few support samples. 4. **Experimental Results**: - Extensive experiments on four datasets show that the proposed method achieves significant performance improvement, with an average increase of 3.67% in recognition accuracy under the 1-shot setting. In summary, the paper aims to improve the visual feature extraction process in few-shot learning by effectively utilizing text information, thereby enhancing the recognition accuracy of new categories.