Fine-tuning Vision Classifiers On A Budget

Sunil Kumar,Ted Sandler,Paulina Varshavskaya
2024-10-01
Abstract:Fine-tuning modern computer vision models requires accurately labeled data for which the ground truth may not exist, but a set of multiple labels can be obtained from labelers of variable accuracy. We tie the notion of label quality to confidence in labeler accuracy and show that, when prior estimates of labeler accuracy are available, using a simple naive-Bayes model to estimate the true labels allows us to label more data on a fixed budget without compromising label or fine-tuning quality. We present experiments on a dataset of industrial images that demonstrates that our method, called Ground Truth Extension (GTX), enables fine-tuning ML models using fewer human labels.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How to efficiently fine - tune modern computer vision models under a limited annotation budget. Specifically, when there is no ready - made ground truth, how to infer high - quality ground truth from labels obtained from multiple annotators with different accuracies for model fine - tuning without sacrificing the quality of labels or model performance. ### Problem Background 1. **High cost of data annotation**: Modern machine - learning models require a large amount of high - quality, accurately - annotated data. These data are usually obtained through expert annotation or crowdsourcing, but expert annotation is costly, and the quality of crowdsourced annotation varies. 2. **Problem of noisy labels**: Training in the presence of noisy labels, especially for large - scale deep - learning models, is prone to over - fitting, thus affecting model performance. 3. **Limited annotation budget**: Under a fixed budget, how to maximize the use of limited annotation resources and obtain as much high - quality label data as possible is a key issue. ### Paper's Solution The authors propose a method called Ground Truth Extension (GTX) to solve the problem in the following ways: - **Using the Naive Bayes model**: Based on the accuracy estimates of historical annotators, use a simple Naive Bayes model to estimate the true label of each data point and its confidence. - **Optimizing the annotation strategy**: Dynamically select the data points that need to be annotated through two strategies (confidence threshold and uncertainty sampling) to minimize the required number of annotations and improve label quality. ### Experimental Verification The paper verifies the effectiveness of the GTX method through a series of experiments: - **Synthetic data experiment**: Compare the performance of GTX with other common label aggregation methods (such as majority voting, weighted majority voting, etc.) under different annotator accuracies and annotation budgets. - **Industrial image classification task**: In an actual industrial image classification task, use the labels generated by GTX to fine - tune the pre - trained EfficientNet - b0 model and evaluate its performance. ### Conclusion The GTX method can generate high - quality ground truth more efficiently under a limited annotation budget, thereby enhancing the effect of model fine - tuning. Especially when the annotator's accuracy is high and the budget is limited, GTX shows significant advantages. ### Related Formulas - Label probability calculation formula: \[ P(Y_i = y | L(i)) \propto P(Y_i = y) P(L(i) | Y_i = y) \] where \[ P(L(i) | Y_i = y) = \prod_{j \in J(i)} \alpha_j^{I[y_j^i = y]} (1 - \alpha_j)^{I[y_j^i \neq y]} \] Here, \( I[\cdot] \) is an indicator function, which takes 1 when the condition is true and 0 otherwise. - Confidence calculation formula: \[ u_i = 1 - \max_y \hat{P}(Y_i = y | L(i)) \] Through these methods, GTX can generate high - quality ground truth more efficiently within a limited annotation budget, thereby enhancing the effect of model fine - tuning.