Abstract:Fine-tuning modern computer vision models requires accurately labeled data for which the ground truth may not exist, but a set of multiple labels can be obtained from labelers of variable accuracy. We tie the notion of label quality to confidence in labeler accuracy and show that, when prior estimates of labeler accuracy are available, using a simple naive-Bayes model to estimate the true labels allows us to label more data on a fixed budget without compromising label or fine-tuning quality. We present experiments on a dataset of industrial images that demonstrates that our method, called Ground Truth Extension (GTX), enables fine-tuning ML models using fewer human labels.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How to efficiently fine - tune modern computer vision models under a limited annotation budget. Specifically, when there is no ready - made ground truth, how to infer high - quality ground truth from labels obtained from multiple annotators with different accuracies for model fine - tuning without sacrificing the quality of labels or model performance. ### Problem Background 1. **High cost of data annotation**: Modern machine - learning models require a large amount of high - quality, accurately - annotated data. These data are usually obtained through expert annotation or crowdsourcing, but expert annotation is costly, and the quality of crowdsourced annotation varies. 2. **Problem of noisy labels**: Training in the presence of noisy labels, especially for large - scale deep - learning models, is prone to over - fitting, thus affecting model performance. 3. **Limited annotation budget**: Under a fixed budget, how to maximize the use of limited annotation resources and obtain as much high - quality label data as possible is a key issue. ### Paper's Solution The authors propose a method called Ground Truth Extension (GTX) to solve the problem in the following ways: - **Using the Naive Bayes model**: Based on the accuracy estimates of historical annotators, use a simple Naive Bayes model to estimate the true label of each data point and its confidence. - **Optimizing the annotation strategy**: Dynamically select the data points that need to be annotated through two strategies (confidence threshold and uncertainty sampling) to minimize the required number of annotations and improve label quality. ### Experimental Verification The paper verifies the effectiveness of the GTX method through a series of experiments: - **Synthetic data experiment**: Compare the performance of GTX with other common label aggregation methods (such as majority voting, weighted majority voting, etc.) under different annotator accuracies and annotation budgets. - **Industrial image classification task**: In an actual industrial image classification task, use the labels generated by GTX to fine - tune the pre - trained EfficientNet - b0 model and evaluate its performance. ### Conclusion The GTX method can generate high - quality ground truth more efficiently under a limited annotation budget, thereby enhancing the effect of model fine - tuning. Especially when the annotator's accuracy is high and the budget is limited, GTX shows significant advantages. ### Related Formulas - Label probability calculation formula: \[ P(Y_i = y | L(i)) \propto P(Y_i = y) P(L(i) | Y_i = y) \] where \[ P(L(i) | Y_i = y) = \prod_{j \in J(i)} \alpha_j^{I[y_j^i = y]} (1 - \alpha_j)^{I[y_j^i \neq y]} \] Here, \( I[\cdot] \) is an indicator function, which takes 1 when the condition is true and 0 otherwise. - Confidence calculation formula: \[ u_i = 1 - \max_y \hat{P}(Y_i = y | L(i)) \] Through these methods, GTX can generate high - quality ground truth more efficiently within a limited annotation budget, thereby enhancing the effect of model fine - tuning.

Fine-tuning Vision Classifiers On A Budget

Category-wise Fine-Tuning: Resisting Incorrect Pseudo-Labels in Multi-Label Image Classification with Partial Labels

Vision-Language Models are Strong Noisy Label Detectors

Fine-Tuning is Fine, if Calibrated

Improving Classification Performance With Human Feedback: Label a few, we label the rest

Tuning Vision-Language Models with Multiple Prototypes Clustering

Are Labels Always Necessary for Classifier Accuracy Evaluation?

Multifaceted Analysis of Fine-Tuning in Deep Model for Visual Recognition

Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models

Label Smarter, Not Harder: CleverLabel for Faster Annotation of Ambiguous Image Classification with Higher Quality

Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data

VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness

Learning Image Labels On-the-fly for Training Robust Classification Models

Leveraging Human-Machine Interactions for Computer Vision Dataset Quality Enhancement

Balancing Label Quantity and Quality for Scalable Elicitation

Curriculum Fine-tuning of Vision Foundation Model for Medical Image Classification Under Label Noise

Evaluating Classifiers Without Expert Labels

Trimming the Risk: Towards Reliable Continuous Training for Deep Learning Inspection Systems

Improved Visual Fine-tuning with Natural Language Supervision

An Empirical Study of Automated Mislabel Detection in Real World Vision Datasets

Partial Fine-Tuning: A Successor to Full Fine-Tuning for Vision Transformers