Abstract:Few-shot image classifiers are designed to recognize and classify new data with minimal supervision and limited data but often show reliance on spurious correlations between classes and spurious attributes, known as spurious bias. Spurious correlations commonly hold in certain samples and few-shot classifiers can suffer from spurious bias induced from them. There is an absence of an automatic benchmarking system to assess the robustness of few-shot classifiers against spurious bias. In this paper, we propose a systematic and rigorous benchmark framework, termed FewSTAB, to fairly demonstrate and quantify varied degrees of robustness of few-shot classifiers to spurious bias. FewSTAB creates few-shot evaluation tasks with biased attributes so that using them for predictions can demonstrate poor performance. To construct these tasks, we propose attribute-based sample selection strategies based on a pre-trained vision-language model, eliminating the need for manual dataset curation. This allows FewSTAB to automatically benchmark spurious bias using any existing test data. FewSTAB offers evaluation results in a new dimension along with a new design guideline for building robust classifiers. Moreover, it can benchmark spurious bias in varied degrees and enable designs for varied degrees of robustness. Its effectiveness is demonstrated through experiments on ten few-shot learning methods across three datasets. We hope our framework can inspire new designs of robust few-shot classifiers. Our code is available at <a class="link-external link-https" href="https://github.com/gtzheng/FewSTAB" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to evaluate the robustness of few - shot image classifiers to spurious bias. Specifically, few - shot image classifiers aim to identify and classify new data with minimal supervision and limited data, but these classifiers often rely on spurious correlations between classes and spurious attributes, which can lead to a decline in the performance of the classifier in practical applications. **Main problems**: 1. **The influence of spurious correlations**: Spurious correlations refer to the associations between certain non - essential attributes in the input data and classes, and these associations only exist in certain samples. For example, in the training set, the background of pictures in a certain class may always be a specific color or texture, which may cause the classifier to learn to rely on these background features for prediction instead of the key features that truly distinguish classes. 2. **Lack of an automated benchmarking system**: Currently, there is no dedicated automated benchmarking framework to evaluate the robustness of few - shot classifiers to spurious bias. Existing benchmarking methods are usually unable to control spurious correlations in test tasks, resulting in unfair evaluation results. To solve these problems, the authors propose a systematic and strict benchmarking framework named FewSTAB to fairly show and quantify the different degrees of robustness of few - shot classifiers to spurious bias. FewSTAB achieves this goal in the following ways: - **Constructing few - shot evaluation tasks with biased attributes**: FewSTAB creates few - shot evaluation tasks that include samples with biased attributes, so that using these biased attributes for prediction will show poor performance. - **Attribute selection strategy based on pre - trained vision - language models**: FewSTAB proposes an attribute selection strategy based on pre - trained vision - language models (VLM), which automatically identifies different attributes in images, so that evaluation tasks can be constructed without manually annotating the data set. Through these methods, FewSTAB can automatically evaluate the performance of existing few - shot classifiers in the face of spurious bias and provide new design guidelines for constructing more robust classifiers. ### Summary The main contributions of this paper include: 1. Proposing a systematic and strict benchmarking framework, FewSTAB, specifically for the spurious bias problem of few - shot classifiers, showing its different degrees of robustness to spurious bias. 2. Proposing an attribute selection strategy based on pre - trained vision - language models, allowing the reuse of existing few - shot benchmark data sets for evaluation. 3. FewSTAB provides a new evaluation dimension and new design guidelines for constructing robust few - shot classifiers. Through experimental verification, FewSTAB evaluated ten few - shot learning methods on three data sets, proving its effectiveness and practicality.

Benchmarking Spurious Bias in Few-Shot Image Classifiers

Discovering Biases in Image Datasets with the Crowd

Feature Transformation for Few-Shot Learning

Defining Benchmarks for Continual Few-Shot Learning

Dataset Bias in Few-shot Image Recognition

A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark

Few-Shot Image Classification Benchmarks are Too Far From Reality: Build Back Better with Semantic Task Sampling

LibFewShot: A Comprehensive Library for Few-Shot Learning

Attribute- and attention-guided few-shot classification

Reweighting and Information-Guidance Networks for Few-Shot Learning

FewSAR: A Few-shot SAR Image Classification Benchmark

An Unbiased Feature Estimation Network for Few-Shot Fine-Grained Image Classification

Adaptive Attribute Distribution Similarity for Few-Shot Learning

Adaptive few-shot learning with a fair priori distribution

Few-Shot Learning with Improved Local Representations Via Bias Rectify Module

Unfairness Discovery and Prevention For Few-Shot Regression

Few and Fewer: Learning Better from Few Examples Using Fewer Base Classes

Few-Shot Image Classification: Current Status and Research Trends

Balancing Feature Alignment and Uniformity for Few-Shot Classification.

A Comparative Review of Recent Few-Shot Object Detection Algorithms

Semantic-Based Few-Shot Learning by Interactive Psychometric Testing