DynamicER: Resolving Emerging Mentions to Dynamic Entities for RAG

Jinyoung Kim,Dayoon Ko,Gunhee Kim
2024-10-15
Abstract:In the rapidly evolving landscape of language, resolving new linguistic expressions in continuously updating knowledge bases remains a formidable challenge. This challenge becomes critical in retrieval-augmented generation (RAG) with knowledge bases, as emerging expressions hinder the retrieval of relevant documents, leading to generator hallucinations. To address this issue, we introduce a novel task aimed at resolving emerging mentions to dynamic entities and present DynamicER benchmark. Our benchmark includes dynamic entity mention resolution and entity-centric knowledge-intensive QA task, evaluating entity linking and RAG model's adaptability to new expressions, respectively. We discovered that current entity linking models struggle to link these new expressions to entities. Therefore, we propose a temporal segmented clustering method with continual adaptation, effectively managing the temporal dynamics of evolving entities and emerging mentions. Extensive experiments demonstrate that our method outperforms existing baselines, enhancing RAG model performance on QA task with resolved mentions.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of how to parse and link newly - emerging language expressions (i.e., emerging mentions) to dynamic entities in a rapidly changing language environment. Specifically, the paper focuses on the problem in the Retrieval - Augmented Generation (RAG) framework, where the appearance of emerging mentions leads to document retrieval failure, which in turn affects the performance of the generation model. #### Main challenges 1. **Parsing of emerging mentions**: As time passes, new language expressions keep emerging, and these expressions may be different names for the same entity. For example, "Elon Musk" can be referred to as "Tesla CEO", "tech billionaire" or "Martian". These emerging mentions make it difficult for traditional entity - linking models to accurately identify and link entities. 2. **Adaptability of dynamic knowledge bases**: In continuously updated knowledge bases, the attributes of entities also change. For example, "Elon Musk" was initially known as "PayPal co - founder", later became "Hyperloop visionary", and most recently is "Twitter owner". Therefore, the system needs to be able to adapt to these changes and accurately parse new mentions. #### Solutions To address these challenges, the authors introduce a new task and benchmark dataset named **DynamicER**, which aims to parse emerging mentions and link them to dynamic entities. Specific contributions include: 1. **Proposing the DynamicER benchmark dataset**: This dataset contains dynamic entity mention parsing tasks and entity - based question - answering tasks, which are used to evaluate the adaptability of entity - linking and RAG models to new expressions. 2. **Proposing the time - segmented clustering method**: Through the continuously - adaptive time - segmented clustering method, effectively manage entities evolving over time and their emerging mentions. This method takes into account the time - dynamic characteristics of entities, thus more accurately distinguishing between entities and mentions. 3. **Experimental verification**: Through extensive experiments, it is proved that this method is superior to existing baseline models, and parsing emerging mentions helps to improve the performance of RAG models in question - answering tasks. ### Summary The core problem of this paper is to solve the problem of parsing and linking emerging mentions in dynamic knowledge bases, especially in the RAG framework, ensuring that the generation model can accurately obtain relevant documents and avoid retrieval failures and generation hallucinations caused by emerging mentions.