Abstract:Few-shot learning for fine-grained image classification has gained recent attention in computer vision. Among the approaches for few-shot learning, due to the simplicity and effectiveness, metric-based methods are favorably state-of-the-art on many tasks. Most of the metric-based methods assume a single similarity measure and thus obtain a single feature space. However, if samples can simultaneously be well classified via two distinct similarity measures, the samples within a class can distribute more compactly in a smaller feature space, producing more discriminative feature maps. Motivated by this, we propose a so-called Bi-Similarity Network (BSNet) that consists of a single embedding module and a bi-similarity module of two similarity measures. After the support images and the query images pass through the convolution-based embedding module, the bi-similarity module learns feature maps according to two similarity measures of diverse characteristics. In this way, the model is enabled to learn more discriminative and less similarity-biased features from few shots of fine-grained images, such that the model generalization ability can be significantly improved. Through extensive experiments by slightly modifying established metric/similarity based networks, we show that the proposed approach produces a substantial improvement on several fine-grained image benchmark datasets. Codes are available at: https://github.com/PRIS-CV/BSNet.

What problem does this paper attempt to address?

This paper attempts to address the problem of how to improve the generalization ability of the model and the discriminative power of features in few-shot fine-grained image classification by using two different similarity measures. Specifically, the paper proposes a Bi-Similarity Network (BSNet), which aims to combine two different similarity measurement methods (e.g., Euclidean distance and cosine distance) to enable the model to map samples of the same category more compactly in a smaller feature space, thereby generating more discriminative feature representations. ### Background and Motivation In few-shot learning, especially in fine-grained image classification tasks, existing metric-based methods usually assume a single similarity measure, which may lead to insufficient generalization ability of the model in small sample cases. The paper points out that if two different similarity measures can be used simultaneously, samples within the same category can be distributed more compactly in the feature space, thereby generating more discriminative feature maps. ### Method Overview The proposed BSNet consists of two parts: 1. **Embedding Module**: Uses convolutional neural networks to generate feature representations of support images and query images. 2. **Bi-Similarity Module**: Contains two similarity measurement branches that respectively calculate the similarity scores between the query image and each category. ### Training Process During meta-training, for each task, the query image generates two similarity scores through the two similarity measurement branches, and then two predicted labels are generated based on these two scores. The loss function is the average of the loss values of the two branches, which is used to update the network parameters through backpropagation. ### Validation and Testing Process During validation and testing, the query image is assigned to the category with the highest average similarity score, and the corresponding one-hot encoding vector is generated. ### Experimental Results The paper conducts experiments on multiple fine-grained image classification benchmark datasets, including FGVC-Aircraft, Stanford-Cars, Stanford-Dogs, and CUB-200-2011. The experimental results show that BSNet significantly improves the performance of few-shot classification on these datasets. ### Main Contributions 1. Proposes a Bi-Similarity Network (BSNet) that combines two similarity measures, significantly improving the performance of four state-of-the-art few-shot classification methods on four fine-grained image datasets. 2. Demonstrates that the model complexity of BSNet is lower than the average complexity of two single-similarity networks, despite BSNet containing more model parameters. 3. Visualizes that BSNet can learn the discriminative regions of the input images. In summary, this paper effectively improves the performance of few-shot fine-grained image classification tasks by introducing dual similarity measures, providing new ideas and methods for research in this field.

BSNet: Bi-Similarity Network for Few-shot Fine-grained Image Classification

Multi-Similarity Enhancement Network for Few-Shot Segmentation.

Local Spatial Alignment Network for Few-Shot Learning

Bidirectional Matching Prototypical Network for Few-Shot Image Classification

Local Mutual Metric Network for Few-Shot Image Classification

Local Feature Semantic Alignment Network for Few-Shot Image Classification

A Two-Stream Network with Image-to-Class Deep Metric for Few-Shot Classification.

S3Net: Spectral–Spatial Siamese Network for Few-Shot Hyperspectral Image Classification

SPNet: Siamese-Prototype Network for Few-Shot Remote Sensing Image Scene Classification

Low-Rank Pairwise Alignment Bilinear Network For Few-Shot Fine-Grained Image Classification

Feature fusion network based on few-shot fine-grained classification

An Unbiased Feature Estimation Network for Few-Shot Fine-Grained Image Classification

Bilaterally Normalized Scale-Consistent Sinkhorn Distance for Few-Shot Image Classification

Sampling-invariant fully metric learning for few-shot object detection

Spatial Attention Network for Few-Shot Learning

Multi-Level Correlation Network For Few-Shot Image Classification

Category Relevance Redirection Network for Few-Shot Classification

Compare More Nuanced:Pairwise Alignment Bilinear Network For Few-shot Fine-grained Learning

Augmented Bi-path Network for Few-shot Learning

Siamese Transformer Networks for Few-shot Image Classification

Shared Nearest Neighbor Calibration for Few-Shot Classification.