Abstract:This paper addresses the few-shot image classification problem, where the classification task is performed on unlabeled query samples given a small amount of labeled support samples only. One major challenge of the few-shot learning problem is the large variety of object visual appearances that prevents the support samples to represent that object comprehensively. This might result in a significant difference between support and query samples, therefore undermining the performance of few-shot algorithms. In this paper, we tackle the problem by proposing Few-shot Cosine Transformer (FS-CT), where the relational map between supports and queries is effectively obtained for the few-shot tasks. The FS-CT consists of two parts, a learnable prototypical embedding network to obtain categorical representations from support samples with hard cases, and a transformer encoder to effectively achieve the relational map from two different support and query samples. We introduce Cosine Attention, a more robust and stable attention module that enhances the transformer module significantly and therefore improves FS-CT performance from 5% to over 20% in accuracy compared to the default scaled dot-product mechanism. Our method performs competitive results in mini -ImageNet, CUB-200, and CIFAR-FS on 1-shot learning and 5-shot learning tasks across backbones and few-shot configurations. We also developed a custom few-shot dataset for Yoga pose recognition to demonstrate the potential of our algorithm for practical application. Our FS-CT with cosine attention is a lightweight, simple few-shot algorithm that can be applied for a wide range of applications, such as healthcare, medical, and security surveillance. The official implementation code of our Few-shot Cosine Transformer is available at https://github.com/vinuni-vishc/Few-Shot-Cosine-Transformer.

Few-Shot Image Classification Based on Swin Transformer + CSAM + EMD

Enhancing Few-Shot Image Classification With Cosine Transformer

CSN: Component supervised network for few-shot classification

Supervised Contrastive Representation Embedding Based on Transformer for Few-Shot Classification

Few-Shot Fine-Grained Image Classification via Multi-Frequency Neighborhood and Double-Cross Modulation

Attribute- and attention-guided few-shot classification

IFSM: an Iterative Feature Selection Mechanism for Few-Shot Image Classification

Learn to Few-Shot Segment Remote Sensing Images from Irrelevant Data

Balancing Feature Alignment and Uniformity for Few-Shot Classification.

Few-Shot Fine-Grained Image Classification: A Comprehensive Review

CTF-SSCL: CNN-Transformer for Few-Shot Hyperspectral Image Classification Assisted by Semisupervised Contrastive Learning

Learning to focus: cascaded feature matching network for few-shot image recognition

Few-Shot Learning Based on Deep Learning for Image Classification

ICSFF: Information Constraint on Self-Supervised Feature Fusion for Few-Shot Remote Sensing Image Classification

SpatialFormer: Semantic and Target Aware Attentions for Few-Shot Learning

Bidirectional Matching Prototypical Network for Few-Shot Image Classification

Adaptive FSS: A Novel Few-Shot Segmentation Framework Via Prototype Enhancement

Multi-Level Correlation Network For Few-Shot Image Classification

Adaptive Local Feature Matching for Few-shot Fine-grained Image Recognition

tSF: Transformer-based Semantic Filter for Few-Shot Learning

DBDC-SSL: Deep Brownian Distance Covariance With Self-Supervised Learning for Few-Shot Image Classification