Abstract:Through prompting, large-scale pre-trained models have become more expressive and powerful, gaining significant attention in recent years. Though these big models have zero-shot capabilities, in general, labeled data are still required to adapt them to downstream tasks. To overcome this critical limitation, we propose an unsupervised fine-tuning framework to directly fine-tune the model or prompt on the unlabeled target data. We demonstrate how to apply our method to both language-augmented vision and masked-language models by aligning the discrete distributions extracted from the prompts and target data. To verify our approach's applicability, we conduct extensive experiments on image classification, sentiment analysis, and natural language inference tasks. Across 13 image-related tasks and 15 language-related ones, the proposed approach achieves consistent improvements over the baselines.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the need for labeled data when existing large - scale pre - trained models adapt to downstream tasks. Although these large models have zero - shot capabilities, in practical applications, in order to make the model better adapt to specific tasks, it is usually still necessary to use labeled data for fine - tuning. However, obtaining high - quality labeled data is often costly and time - consuming. To address this critical limitation, the authors propose an unsupervised fine - tuning framework - POUF (Prompt - oriented unsupervised fine - tuning), which can directly fine - tune the model or prompt on unlabeled target data. Specifically, POUF achieves this by minimizing the statistical distance between the discrete distributions extracted from the prompt and the target data. This method is applicable to language - enhanced vision models and masked language models, and improves the model's ability to capture changes in target data by aligning category prototypes and target features in the latent space. ### Main Contributions 1. **Proposed a prompt - oriented unsupervised fine - tuning framework**: POUF can directly fine - tune large - scale pre - trained models with zero - shot capabilities on unlabeled target data. 2. **Demonstrated the effectiveness of POUF in multiple tasks**: The effectiveness of POUF in tasks such as image classification, sentiment analysis, and natural language inference has been verified through extensive experiments. 3. **Conducted a detailed ablation study**: Explained the effectiveness of the method design decisions. ### Method Overview The core idea of POUF is to align category prototypes and target features in the latent space to reduce the distribution difference between the source domain and the target domain. For language - enhanced vision models, POUF achieves this by aligning category - specific language prompt representations and target image features. For masked language models, POUF achieves this by aligning the masked token representations extracted from the language prompt and the text prototypes generated by the decoder head. ### Experimental Results The paper conducted experiments on multiple datasets, including Office - 31, Office - Home, and DomainNet, covering image classification and language modeling tasks. The experimental results show that POUF significantly outperforms the baseline methods in multiple tasks, especially in the performance on unlabeled data. ### Conclusion POUF provides an effective method that can directly fine - tune large - scale pre - trained models on unlabeled target data without using labeled data, thereby improving the adaptability and performance of the model in new tasks.

POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained models

Training-Free Unsupervised Prompt for Vision-Language Models

Pro-tuning: Unified Prompt Tuning for Vision Tasks

Unsupervised Prompt Tuning for Text-Driven Object Detection

PPT: Pre-trained Prompt Tuning for Few-shot Learning

APrompt: Attention Prompt Tuning for Efficient Adaptation of Pre-trained Language Models

Towards Unified Prompt Tuning for Few-shot Text Classification

Prompt Tuning for Unified Multimodal Pretrained Models.

Don't Stop Pretraining? Make Prompt-based Fine-tuning Powerful Learner

Revisiting the Power of Prompt for Visual Tuning

Unified Vision and Language Prompt Learning

MuDPT: Multi-modal Deep-symphysis Prompt Tuning for Large Pre-trained Vision-Language Models

Masked Siamese Prompt Tuning for Few-Shot Natural Language Understanding

SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with Superposition of Multi Token Embeddings

BayesPrompt: Prompting Large-Scale Pre-Trained Language Models on Few-shot Inference via Debiased Domain Abstraction

Visual Prompt Tuning

VPPT: Visual Pre-Trained Prompt Tuning Framework for Few-Shot Image Classification

Revisiting Prompt Pretraining of Vision-Language Models

Dynamic Prompting: A Unified Framework for Prompt Tuning

Embedded Prompt Tuning: Towards Enhanced Calibration of Pretrained Models for Medical Images