Abstract:Machine learning models are increasingly being utilized across various fields and tasks due to their outstanding performance and strong generalization capabilities. Nonetheless, their success hinges on the availability of large volumes of annotated data, the creation of which is often labor-intensive, time-consuming, and expensive. Many active learning (AL) approaches have been proposed to address these challenges, but they often fail to fully leverage the information from the core phases of AL, such as training on the labeled set and querying new unlabeled samples. To bridge this gap, we propose a novel AL approach, Loss Prediction Loss with Gradient Norm (LPLgrad), designed to quantify model uncertainty effectively and improve the accuracy of image classification tasks. LPLgrad operates in two distinct phases: (i) {\em Training Phase} aims to predict the loss for input features by jointly training a main model and an auxiliary model. Both models are trained on the labeled data to maximize the efficiency of the learning process, an aspect often overlooked in previous AL methods. This dual-model approach enhances the ability to extract complex input features and learn intrinsic patterns from the data effectively; (ii) {\em Querying Phase} that quantifies the uncertainty of the main model to guide sample selection. This is achieved by calculating the gradient norm of the entropy values for samples in the unlabeled dataset. Samples with the highest gradient norms are prioritized for labeling and subsequently added to the labeled set, improving the model's performance with minimal labeling effort. Extensive evaluations on real-world datasets demonstrate that the LPLgrad approach outperforms state-of-the-art methods by order of magnitude in terms of accuracy on a small number of labeled images, yet achieving comparable training and querying times in multiple image classification tasks.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve the problem that machine - learning models rely on a large amount of labeled data when performing image - classification tasks. Specifically, the author points out:
1. **High cost of obtaining labeled data**: High - quality labeled data often requires a great deal of time and manpower, especially in fields such as medical imaging and speech recognition, which makes obtaining large - scale labeled data very expensive and time - consuming.
2. **Limitations of existing Active Learning (AL) methods**:
- Although many existing AL methods reduce the effort of manual labeling, they fail to fully utilize the information in the core stages of AL (such as training the labeled set and querying new unlabeled samples).
- For example, the method based on Loss Prediction Loss (LPL) introduces a loss - prediction module, but it is unstable on large - scale data sets and the criteria for selecting samples are not effective enough.
To solve these problems, the author proposes a new active - learning method - **LPLgrad** (Loss Prediction Loss with Gradient Norm). LPLgrad improves traditional AL methods in the following ways:
- **Joint training of the main model and the auxiliary model**: During the training stage, LPLgrad simultaneously trains a main model (for feature extraction) and an auxiliary model (for loss prediction) to improve the ability to learn complex input features.
- **Sample selection based on gradient norms**: During the query stage, LPLgrad quantifies the model's uncertainty by calculating the gradient norm of the entropy value of unlabeled samples and gives priority to selecting samples with the highest gradient norms for labeling. This method can not only capture the model's uncertainty more accurately but also reduce the selection of redundant data.
### Summary
The main goal of LPLgrad is to reduce the dependence on a large amount of labeled data by optimizing the sample - selection strategy in the active - learning process, thereby improving the efficiency and accuracy of image - classification tasks. Experimental results show that LPLgrad performs better than existing advanced methods on multiple data sets, especially when the labeling budget is limited.
### Formula summary
- **Loss function of the main model**:
\[
l_{\text{main}}=\frac{1}{N}\sum_{i = 1}^{N}L_{\text{CE}}(y_i,y_{\text{main}})
\]
where \(L_{\text{CE}}\) is the cross - entropy loss function, \(y_i\) is the true label of the sample, and \(y_{\text{main}}\) is the predicted output of the main model.
- **Loss function of the auxiliary model**:
\[
l_{\text{aux}}=\frac{1}{P}\sum_{i = 1}^{P}\max(0,M - d_i\cdot(l_{\text{aux},i}-l_{\text{main},i}))
\]
where \(M\) is the margin parameter, \(d_i=\max(0,l_{\text{main},i})\) is used to determine the direction of the margin penalty, and \(l_{\text{aux},i}\) and \(l_{\text{main},i}\) are the predicted losses of the auxiliary model and the main model for the \(i\)-th sample, respectively.
- **Total loss function**:
\[
L_{\text{total}}=l_{\text{aux}}+l_{\text{main}}
\]
- **Calculation of sample entropy**:
\[
H(P(y_i|x_i))=-\sum_{c = 1}^{C}P(y_i = c|x_i)\log P(y_i = c|x_i)
\]
where \(P(y_i = c|x_i)\) is the posterior probability that sample \(x_i\) belongs to class \(c\).
- **Calculation of gradient norms**:
\[
g_i=\|\nabla_