Abstract:Visual Question Answering (VQA) aims to answer the natural language question about a given image by understanding multimodal content. However, the answer quality of most existing visual-language pre-training (VLP) methods is still limited, mainly due to: (1) Incompatibility. Upstream pre-training tasks are generally incompatible with downstream question answering tasks, which makes the knowledge from the language model not well transferable to downstream tasks, and greatly limits their performance in few-shot scenarios; (2) Under-fitting. They generally do not integrate human priors to compensate for universal knowledge from language models, so as to fit the challenging VQA problem and generate reliable answers. To address these issues, we propose HybridPrompt, a cloze- and verify-style hybrid prompt framework with bridging language models and human priors in prompt tuning for VQA. Specifically, we first modify the input questions into the cloze-style prompts to narrow the gap between upstream pre-training tasks and downstream VQA task, which ensures that the universal knowledge in the language model can be better transferred to subsequent human prior-guided prompt tuning. Then, we imitate the cognitive process of human brain to introduce topic and sample related priors to construct a dynamic learnable prompt template for human prior-guided prompt learning. Finally, we add fixed-length learnable free-parameters to further enhance the generalizability and scalability of prompt learning in the VQA model. Experimental results verify the effectiveness of HybridPrompt, showing that it achieves competitive performance against previous methods on widely-used VQAv2 dataset and obtains new state-of-the-art results. Our code is released at: https://github.com/zhizhi111/hybrid.

Eliciting Knowledge from Pretrained Language Models for Prototypical Prompt Verbalizer

Prototypical Verbalizer for Prompt-based Few-shot Tuning

Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification

A Novel Prompt-tuning Method: Incorporating Scenario-specific Concepts into a Verbalizer

Rethinking Visual Prompt Learning as Masked Visual Token Modeling

What Makes Pre-trained Language Models Better Zero/Few-shot Learners?

Evolutionary Verbalizer Search for Prompt-based Few Shot Text Classification

HybridPrompt: Bridging Language Models and Human Priors in Prompt Tuning for Visual Question Answering

Prompting through Prototype: A Prototype-based Prompt Learning on Pretrained Vision-Language Models

Knowledge-Enhanced Prompt Learning for Few-Shot Text Classification

Effective Structured Prompting by Meta-Learning and Representative Verbalizer

Manifold-based Verbalizer Space Re-embedding for Tuning-free Prompt-based Classification

SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics

TransPrompt V2: Transferable Prompt-based Fine-tuning for Few-shot Text Classification

Revisiting Prompt Pretraining of Vision-Language Models

Exploring Lottery Prompts for Pre-trained Language Models

Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

AdaPrompt: Adaptive Model Training for Prompt-based NLP

Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models

BayesPrompt: Prompting Large-Scale Pre-Trained Language Models on Few-shot Inference via Debiased Domain Abstraction