Simple Semantic-Aided Few-Shot Learning

Hai Zhang,Junzhe Xu,Shanlin Jiang,Zhenan He
2024-04-09
Abstract:Learning from a limited amount of data, namely Few-Shot Learning, stands out as a challenging computer vision task. Several works exploit semantics and design complicated semantic fusion mechanisms to compensate for rare representative features within restricted data. However, relying on naive semantics such as class names introduces biases due to their brevity, while acquiring extensive semantics from external knowledge takes a huge time and effort. This limitation severely constrains the potential of semantics in Few-Shot Learning. In this paper, we design an automatic way called Semantic Evolution to generate high-quality semantics. The incorporation of high-quality semantics alleviates the need for complex network structures and learning algorithms used in previous works. Hence, we employ a simple two-layer network termed Semantic Alignment Network to transform semantics and visual features into robust class prototypes with rich discriminative features for few-shot classification. The experimental results show our framework outperforms all previous methods on six benchmarks, demonstrating a simple network with high-quality semantics can beat intricate multi-modal modules on few-shot classification tasks. Code is available at
Computer Science
What problem does this paper attempt to address?
The paper proposes a solution to the few-shot learning (FSL) problem in the field of computer vision. In FSL, the model needs to learn new concepts from limited data. Traditional methods rely on complex network architectures and learning algorithms to extract representative features from images, but this approach is not effective when data is scarce. The paper points out that relying solely on category names (such as nouns) as semantic information may introduce ambiguity, while obtaining rich semantic knowledge from external sources is time-consuming and laborious. To address these issues, the paper introduces a simple framework called "Semantic-Aided Few-Shot Learning (SemFew)". The framework consists of two main parts: Semantic Evolution, which generates high-quality semantic descriptions through an automatic process to complement the limitations of category names; and Semantic Alignment Network (SemAlign), a two-layer network that transforms high-quality semantics and visual features into rich and discriminative class prototypes for more accurate classification. Specifically, Semantic Evolution first converts category names into concise descriptions that match the image content, and then further expands and rewrites these descriptions to include more category-related knowledge. The Semantic Alignment Network does not require a complex semantic understanding module, but rather uses a simple two-layer network to fuse the input semantics and visual features, reconstructing more stable class prototypes. Experimental results demonstrate that SemFew outperforms previous methods on six benchmark datasets, demonstrating that a simple network combined with high-quality semantic information can achieve excellent performance in few-shot classification tasks. The paper also investigates the influence of different types of semantic sources, prototype selection strategies, and classifiers on the results.