Abstract:Tags are pivotal in facilitating the effective distribution of multimedia content in various applications in the contemporary Internet era, such as search engines and recommendation systems. Recently, large language models (LLMs) have demonstrated impressive capabilities across a wide range of tasks. In this work, we propose TagGPT, a fully automated system capable of tag extraction and multimodal tagging in a completely zero-shot fashion. Our core insight is that, through elaborate prompt engineering, LLMs are able to extract and reason about proper tags given textual clues of multimodal data, e.g., OCR, ASR, title, etc. Specifically, to automatically build a high-quality tag set that reflects user intent and interests for a specific application, TagGPT predicts large-scale candidate tags from a series of raw data via prompting LLMs, filtered with frequency and semantics. Given a new entity that needs tagging for distribution, TagGPT introduces two alternative options for zero-shot tagging, i.e., a generative method with late semantic matching with the tag set, and another selective method with early matching in prompts. It is well noticed that TagGPT provides a system-level solution based on a modular framework equipped with a pre-trained LLM (GPT-3.5 used here) and a sentence embedding model (SimCSE used here), which can be seamlessly replaced with any more advanced one you want. TagGPT is applicable for various modalities of data in modern social media and showcases strong generalization ability to a wide range of applications. We evaluate TagGPT on publicly available datasets, i.e., Kuaishou and <a class="link-external link-http" href="http://Food.com" rel="external noopener nofollow">this http URL</a>, and demonstrate the effectiveness of TagGPT compared to existing hashtags and off-the-shelf taggers. Project page: <a class="link-external link-https" href="https://github.com/TencentARC/TagGPT" rel="external noopener nofollow">this https URL</a>.

Larger-Context Tagging: when and Why Does It Work?

TagNetLens: multiscale visualization of knowledge structures in social tags

Large-Scale Question Tagging Via Joint Question-Topic Embedding Learning.

Learning Tag Relevance By Context Analysis For Social Image Retrieval

Empower Your Model with Longer and Better Context Comprehension

LTNER: Large Language Model Tagging for Named Entity Recognition with Contextualized Entity Marking

TagGPT: Large Language Models are Zero-shot Multimodal Taggers

Hierarchical Context Tagging for Utterance Rewriting

Contextual LSTM (CLSTM) models for Large scale NLP tasks

Context-Aware Learning for Neural Machine Translation

Let's Ask GNN: Empowering Large Language Model for Graph In-Context Learning

Towards Making the Most of Context in Neural Machine Translation

Long Context RAG Performance of Large Language Models

Combining Context Features by Canonical Belief Network for Chinese Part-Of-Speech Tagging.

Attention-guided chained context aggregation for semantic segmentation

Context Matters: An Empirical Study of the Impact of Contextual Information in Temporal Question Answering Systems

Can Large Language Models Understand Context?

Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags

Annotations Are Not All You Need: A Cross-modal Knowledge Transfer Network for Unsupervised Temporal Sentence Grounding.

Multi-Task Cross-Lingual Sequence Tagging from Scratch

Inferring Correspondences from Multiple Sources for Microblog User Tags