Oops, I Sampled it Again: Reinterpreting Confidence Intervals in Few-Shot Learning

Raphael Lafargue,Luke Smith,Franck Vermet,Mathias Löwe,Ian Reid,Vincent Gripon,Jack Valmadre

2024-09-06

Abstract:The predominant method for computing confidence intervals (CI) in few-shot learning (FSL) is based on sampling the tasks with replacement, i.e.\ allowing the same samples to appear in multiple tasks. This makes the CI misleading in that it takes into account the randomness of the sampler but not the data itself. To quantify the extent of this problem, we conduct a comparative analysis between CIs computed with and without replacement. These reveal a notable underestimation by the predominant method. This observation calls for a reevaluation of how we interpret confidence intervals and the resulting conclusions in FSL comparative studies. Our research demonstrates that the use of paired tests can partially address this issue. Additionally, we explore methods to further reduce the (size of the) CI by strategically sampling tasks of a specific size. We also introduce a new optimized benchmark, which can be accessed at <a class="link-external link-https" href="https://github.com/RafLaf/FSL-benchmark-again" rel="external noopener nofollow">this https URL</a>

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

The paper primarily explores the issues in calculating Confidence Intervals (CI) in Few-Shot Learning (FSL) and proposes improved methods. Specifically: 1. **Problems with Existing Methods**: The current main method for calculating CI is through task generation with replacement sampling, which leads to CI considering only the randomness of sampling while ignoring the characteristics of the data itself. This type of CI is referred to as "Closed Confidence Intervals" (CCIs). 2. **Open Confidence Intervals (OCIs)**: Unlike CCIs, OCIs are calculated through sampling without replacement, which better reflects the true distribution of the data. However, the drawback of OCIs is that they limit the number of tasks that can be generated, especially on small datasets, which may result in a larger CI range. 3. **Comparative Analysis**: The paper compares the performance of CCIs and OCIs on multiple standard visual datasets through experiments. The results show that on small datasets, CCIs are significantly narrower than OCIs, while on large datasets, the opposite is true. Additionally, the paper finds that when accuracy approaches 100%, both types of CI become narrower due to the saturation of accuracy reducing variance. 4. **Paired Tests**: To improve the reliability of comparison results, the paper introduces the paired test method. This method evaluates different approaches on the same set of tasks, reducing the impact of task difficulty differences, thereby making the conclusions more reliable. 5. **Optimizing Task Size**: To further reduce the CI range, the paper explores how to adjust the size of tasks. Specifically, by increasing the number of query samples (Q), the CI range can be reduced to some extent, but this also reduces the number of tasks that can be generated. Therefore, there is an optimal Q value that can effectively reduce the CI range. In summary, the paper aims to emphasize the importance of correctly understanding and interpreting CI in FSL and proposes a series of improvements to enhance the accuracy and reliability of method comparisons.

Oops, I Sampled it Again: Reinterpreting Confidence Intervals in Few-Shot Learning

Defining Benchmarks for Continual Few-Shot Learning

Instance Credibility Inference for Few-Shot Learning

Interventional Few-Shot Learning

Benchmarking Spurious Bias in Few-Shot Image Classifiers

Evaluating the Evaluators: Are Current Few-Shot Learning Benchmarks Fit for Purpose?

How to Trust Unlabeled Data? Instance Credibility Inference for Few-Shot Learning

A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark

Few-Shot Learning With Class Imbalance

Reweighting and Information-Guidance Networks for Few-Shot Learning

Balancing Feature Alignment and Uniformity for Few-Shot Classification.

Few-Shot Recalibration of Language Models

FSL-Rectifier: Rectify Outliers in Few-Shot Learning via Test-Time Augmentation

Exploring Lottery Ticket Hypothesis in Few-Shot Learning

Expanding continual few-shot learning benchmarks to include recognition of specific instances

Diagnosing and Remedying Shot Sensitivity with Cosine Few-Shot Learners

Adaptive few-shot learning with a fair priori distribution

Few-Shot Image Classification Benchmarks are Too Far From Reality: Build Back Better with Semantic Task Sampling

Learning to Capture the Query Distribution for Few-Shot Learning

Generalizing from a Few Examples: A Survey on Few-Shot Learning

Automatic Combination of Sample Selection Strategies for Few-Shot Learning