Abstract:Meta-learning has recently promoted few-shot text classification, which identifies target classes based on information transferred from source classes through a series of small tasks or episodes. Existing works constructing their meta-learner on Prototypical Networks need improvement in learning discriminative text representations between similar classes that may lead to conflicts in label prediction. The overfitting problems caused by a few training instances need to be adequately addressed. In addition, efficient episode sampling procedures that could enhance few-shot training should be utilized. To address the problems mentioned above, we first present a contrastive learning framework that simultaneously learns discriminative text representations via supervised contrastive learning while mitigating the overfitting problem via unsupervised contrastive regularization, and then we build an efficient self-paced episode sampling approach on top of it to include more difficult episodes as training progresses. Empirical results on 8 few-shot text classification datasets show that our model outperforms the current state-of-the-art models. The extensive experimental analysis demonstrates that our supervised contrastive representation learning and unsupervised contrastive regularization techniques improve the performance of few-shot text classification. The episode-sampling analysis reveals that our self-paced sampling strategy improves training efficiency.
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the label prediction conflict problems existing in the existing few - sample text classification methods when dealing with fine - grained similar categories, and the over - fitting problems caused by a small number of training instances. Specifically:
1. **Label Prediction Conflict**: Existing prototype - network - based methods may generate very similar text representations when dealing with categories with slight differences but similar semantics, which leads to the problem of incorrect label assignment in the prediction stage. For example, in intent classification, the sentence "Who covered 'A Toast to the Distant Place'" (music query intent) and "Play 'A Toast to the Distant Place'" (music play intent) are semantically similar but belong to different intent categories. If these sentences are sampled into the same query set, they may obtain similar measurement values, resulting in misclassification.
2. **Over - fitting Problem**: Due to the limited number of training instances available in few - sample learning, the model is prone to over - fit the distribution of source categories. This over - fitting occurs not only at the instance level but also at the task level, that is, the model is over - confident in the training task and cannot generalize well to unseen target tasks.
3. **Efficient Sampling Problem**: In few - sample text classification, the task difficulty of random sampling usually follows a normal distribution. Most tasks are relatively simple, and only a few tasks are more difficult. As the training progresses, the task loss of random sampling drops rapidly, which may lead to the stagnation of training and reduce the learning efficiency.
To address these problems, the authors propose a self - paced contrastive learning framework named SPContrastNet, with the following main contributions:
1. **Contrastive Learning Framework**: Learn more discriminative text representations through the supervised contrastive learning method to reduce prediction contradictions caused by similar text representations.
2. **Unsupervised Contrastive Regularization**: Introduce two unsupervised contrastive losses as regularization terms to alleviate the over - fitting problem at the task level and the instance level respectively, enabling the model to learn more distinct task and instance representations.
3. **Self - paced Sampling Strategy**: Propose a self - paced sampling strategy to gradually increase the task difficulty, improve the training efficiency, and prevent the reduction of learning efficiency caused by the rapid decline of task difficulty during the training process.
Through these methods, the experimental results of SPContrastNet on multiple few - sample text classification datasets show that this model is superior to the current state - of - the - art models and performs excellently in learning discriminative text representations, alleviating over - fitting problems, and improving training efficiency.