LLM-TAKE: Theme Aware Keyword Extraction Using Large Language Models

Reza Yousefi Maragheh,Chenhao Fang,Charan Chand Irugu,Parth Parikh,Jason Cho,Jianpeng Xu,Saranyan Sukumar,Malay Patel,Evren Korpeoglu,Sushant Kumar,Kannan Achan
2023-12-02
Abstract:Keyword extraction is one of the core tasks in natural language processing. Classic extraction models are notorious for having a short attention span which make it hard for them to conclude relational connections among the words and sentences that are far from each other. This, in turn, makes their usage prohibitive for generating keywords that are inferred from the context of the whole text. In this paper, we explore using Large Language Models (LLMs) in generating keywords for items that are inferred from the items textual metadata. Our modeling framework includes several stages to fine grain the results by avoiding outputting keywords that are non informative or sensitive and reduce hallucinations common in LLM. We call our LLM-based framework Theme-Aware Keyword Extraction (LLM TAKE). We propose two variations of framework for generating extractive and abstractive themes for products in an E commerce setting. We perform an extensive set of experiments on three real data sets and show that our modeling framework can enhance accuracy based and diversity based metrics when compared with benchmark models.
Information Retrieval,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper primarily aims to address several key issues in the task of keyword extraction: 1. **Limitations of Classic Models**: - Classic keyword extraction models often suffer from the problem of "short attention span," making it difficult for them to capture relationships between distant words in the text. - These models typically rely on domain-specific training data, thus performing poorly when dealing with texts that are dissimilar to the training data. 2. **Generating Context-Aware Keywords**: - In an e-commerce environment, users need to quickly understand product features through keywords, thereby improving shopping efficiency and experience. - Existing methods struggle to generate keywords that comprehensively reflect the entire text's theme, especially when extracting abstract keywords. 3. **Reducing Hallucination**: - Large Language Models (LLMs), while capable of generating higher-quality keywords, may produce hallucinations, i.e., generating information that is irrelevant or inaccurate with respect to the input text. To address these issues, the paper proposes a multi-stage framework—Topic-Aware Keyword Extraction (LLM-TAKE), which leverages large language models to generate context-aware keywords and employs a series of steps to reduce hallucination and improve keyword quality. Experimental results show that this framework outperforms baseline models on 3 real-world datasets.