Abstract:In recent years, there has been significant progress in developing pre-trained language models for NLP. However, these models often struggle when fine-tuned on small datasets. To address this issue, researchers have proposed various adaptation approaches. Prompt-based tuning is arguably the most common way, especially for larger models. Previous research shows that adding contrastive learning to prompt-based fine-tuning is effective as it helps the model generate embeddings that are more distinguishable between classes, and it can also be more sample-efficient as the model learns from positive and negative examples simultaneously. One of the most important components of contrastive learning is data augmentation, but unlike computer vision, effective data augmentation for NLP is still challenging. This paper proposes LM-CPPF, Contrastive Paraphrasing-guided Prompt-based Fine-tuning of Language Models, which leverages prompt-based few-shot paraphrasing using generative language models, especially large language models such as GPT-3 and OPT-175B, for data augmentation. Our experiments on multiple text classification benchmarks show that this augmentation method outperforms other methods, such as easy data augmentation, back translation, and multiple templates.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the issue of performance degradation when fine-tuning pre-trained language models (PLMs) on small sample datasets. Specifically, the authors propose a new method—LM-CPPF (Contrastive Paraphrasing-guided Prompt-based Fine-tuning of Language Models), which leverages generative language models (especially large language models like GPT-3 and OPT-175B) for prompt-based few-shot paraphrase generation to enhance data. This method combines contrastive learning to improve the model's discriminative ability by generating different data views, thereby achieving better results in text classification tasks. ### Background and Motivation 1. **Limitations of Pre-trained Language Models**: Although pre-trained language models (PLMs) have been self-supervised trained on large-scale corpora and achieved significant results in various natural language processing (NLP) tasks, their performance significantly drops when fine-tuned on small sample datasets. 2. **Advantages of Contrastive Learning**: Contrastive learning helps models generate more discriminative embeddings by generating positive and negative examples of different categories, thereby improving the model's learning efficiency. 3. **Challenges of Data Augmentation**: Effective data augmentation in NLP remains challenging compared to computer vision. Traditional data augmentation methods (such as simple word replacement, back-translation, etc.) often affect semantics, leading to limited effectiveness. ### Method Overview 1. **Few-shot Paraphrase Generation**: Utilizing large language models (such as GPT-3 and OPT-175B) to generate high-quality paraphrases that differ in vocabulary and syntactic structure but retain the original sentence's semantics. 2. **Contrastive Learning**: Improving the model's discriminative ability by generating different data views and combining them with a contrastive learning framework. 3. **Experimental Validation**: Conducting experiments on multiple text classification benchmark datasets to validate the effectiveness of the LM-CPPF method. ### Main Contributions 1. **Proposed a New Data Augmentation Method**: Using large language models to generate high-quality paraphrases, addressing the semantic inconsistency issue of traditional data augmentation methods. 2. **Combined with Contrastive Learning**: Improving the model's performance on small sample datasets through a contrastive learning framework. 3. **Experimental Proof**: Demonstrating that the LM-CPPF method outperforms other data augmentation methods, such as Easy Data Augmentation (EDA), Back-Translation (BT), and multi-template methods, in multiple text classification tasks. ### Conclusion Through experimental validation, the LM-CPPF method significantly improves the performance of fine-tuning pre-trained language models on small sample datasets. This method not only excels in text classification tasks but also provides new insights for future research, especially in few-shot learning and data augmentation.

LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning

PromptDA: Label-guided Data Augmentation for Prompt-based Few-shot Learners

Enhancing Black-Box Few-Shot Text Classification with Prompt-Based Data Augmentation

Making Pre-trained Language Models End-to-end Few-shot Learners with Contrastive Prompt Tuning

ConsPrompt: Exploiting Contrastive Samples for Fewshot Prompt Learning

Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning

Learning from Contrastive Prompts: Automated Optimization and Adaptation

Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

RIFF: Learning to Rephrase Inputs for Few-shot Fine-tuning of Language Models

APrompt: Attention Prompt Tuning for Efficient Adaptation of Pre-trained Language Models

Differentiable Data Augmentation for Contrastive Sentence Representation Learning

POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained models

PCC: Paraphrasing with Bottom-k Sampling and Cyclic Learning for Curriculum Data Augmentation

PPT: Pre-trained Prompt Tuning for Few-shot Learning

Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning

Scaled Prompt-Tuning for Few-Shot Natural Language Generation

Towards Unified Prompt Tuning for Few-shot Text Classification

EPA: Easy Prompt Augmentation on Large Language Models via Multiple Sources and Multiple Targets

Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners