POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained models

Korawat Tanwisuth,Shujian Zhang,Huangjie Zheng,Pengcheng He,Mingyuan Zhou
2023-04-30
Abstract:Through prompting, large-scale pre-trained models have become more expressive and powerful, gaining significant attention in recent years. Though these big models have zero-shot capabilities, in general, labeled data are still required to adapt them to downstream tasks. To overcome this critical limitation, we propose an unsupervised fine-tuning framework to directly fine-tune the model or prompt on the unlabeled target data. We demonstrate how to apply our method to both language-augmented vision and masked-language models by aligning the discrete distributions extracted from the prompts and target data. To verify our approach's applicability, we conduct extensive experiments on image classification, sentiment analysis, and natural language inference tasks. Across 13 image-related tasks and 15 language-related ones, the proposed approach achieves consistent improvements over the baselines.
Machine Learning,Artificial Intelligence,Computation and Language,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the need for labeled data when existing large - scale pre - trained models adapt to downstream tasks. Although these large models have zero - shot capabilities, in practical applications, in order to make the model better adapt to specific tasks, it is usually still necessary to use labeled data for fine - tuning. However, obtaining high - quality labeled data is often costly and time - consuming. To address this critical limitation, the authors propose an unsupervised fine - tuning framework - POUF (Prompt - oriented unsupervised fine - tuning), which can directly fine - tune the model or prompt on unlabeled target data. Specifically, POUF achieves this by minimizing the statistical distance between the discrete distributions extracted from the prompt and the target data. This method is applicable to language - enhanced vision models and masked language models, and improves the model's ability to capture changes in target data by aligning category prototypes and target features in the latent space. ### Main Contributions 1. **Proposed a prompt - oriented unsupervised fine - tuning framework**: POUF can directly fine - tune large - scale pre - trained models with zero - shot capabilities on unlabeled target data. 2. **Demonstrated the effectiveness of POUF in multiple tasks**: The effectiveness of POUF in tasks such as image classification, sentiment analysis, and natural language inference has been verified through extensive experiments. 3. **Conducted a detailed ablation study**: Explained the effectiveness of the method design decisions. ### Method Overview The core idea of POUF is to align category prototypes and target features in the latent space to reduce the distribution difference between the source domain and the target domain. For language - enhanced vision models, POUF achieves this by aligning category - specific language prompt representations and target image features. For masked language models, POUF achieves this by aligning the masked token representations extracted from the language prompt and the text prototypes generated by the decoder head. ### Experimental Results The paper conducted experiments on multiple datasets, including Office - 31, Office - Home, and DomainNet, covering image classification and language modeling tasks. The experimental results show that POUF significantly outperforms the baseline methods in multiple tasks, especially in the performance on unlabeled data. ### Conclusion POUF provides an effective method that can directly fine - tune large - scale pre - trained models on unlabeled target data without using labeled data, thereby improving the adaptability and performance of the model in new tasks.