Large Language Models are Good Prompt Learners for Low-Shot Image Classification

Zhaoheng Zheng,Jingmin Wei,Xuefeng Hu,Haidong Zhu,Ram Nevatia
2024-04-03
Abstract:Low-shot image classification, where training images are limited or inaccessible, has benefited from recent progress on pre-trained vision-language (VL) models with strong generalizability, e.g. CLIP. Prompt learning methods built with VL models generate text features from the class names that only have confined class-specific information. Large Language Models (LLMs), with their vast encyclopedic knowledge, emerge as the complement. Thus, in this paper, we discuss the integration of LLMs to enhance pre-trained VL models, specifically on low-shot classification. However, the domain gap between language and vision blocks the direct application of LLMs. Thus, we propose LLaMP, Large Language Models as Prompt learners, that produces adaptive prompts for the CLIP text encoder, establishing it as the connecting bridge. Experiments show that, compared with other state-of-the-art prompt learning methods, LLaMP yields better performance on both zero-shot generalization and few-shot image classification, over a spectrum of 11 datasets. Code will be made available at: <a class="link-external link-https" href="https://github.com/zhaohengz/LLaMP" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper mainly discusses how to enhance low-shot image classification using large language models (LLMs). In low-shot image classification tasks, researchers rely on pre-trained visual language models like CLIP to extract limited information from class names due to the limited or difficult-to-obtain training images. However, current methods face challenges in distinguishing fine-grained target categories. The paper introduces LLaMP (Large Language Models as Prompt learners), which improves the CLIP text encoder by generating adaptive prompts using LLMs to provide richer category-specific information. In this way, LLaMP outperforms other state-of-the-art methods in zero-shot and few-shot image classification and demonstrates average performance improvement on 11 datasets. In short, the paper attempts to address how to leverage the knowledge of LLMs to enhance the performance of low-shot image classification.