Trends in Integration of Knowledge and Large Language Models: A Survey and Taxonomy of Methods, Benchmarks, and Applications

Zhangyin Feng,Weitao Ma,Weijiang Yu,Lei Huang,Haotian Wang,Qianglong Chen,Weihua Peng,Xiaocheng Feng,Bing Qin,Ting liu
2024-10-23
Abstract:Large language models (LLMs) exhibit superior performance on various natural language tasks, but they are susceptible to issues stemming from outdated data and domain-specific limitations. In order to address these challenges, researchers have pursued two primary strategies, knowledge editing and retrieval augmentation, to enhance LLMs by incorporating external information from different aspects. Nevertheless, there is still a notable absence of a comprehensive survey. In this paper, we propose a review to discuss the trends in integration of knowledge and large language models, including taxonomy of methods, benchmarks, and applications. In addition, we conduct an in-depth analysis of different methods and point out potential research directions in the future. We hope this survey offers the community quick access and a comprehensive overview of this research area, with the intention of inspiring future research endeavors.
Computation and Language
What problem does this paper attempt to address?
The problems that this paper attempts to solve are a series of challenges faced by large language models (LLMs) when dealing with knowledge - intensive tasks. Although LLMs perform well in a variety of natural language tasks, they still have some significant problems: 1. **Out - of - date data**: LLMs usually contain knowledge up to a specific point in time, which means they cannot access the latest world knowledge. For example, ChatGPT's parameters only contain information up to September 2021 and are completely unaware of the latest world developments. 2. **Difficulty in learning long - tail knowledge**: LLMs perform poorly in learning long - tail knowledge, that is, those uncommon or marginalized knowledge areas. 3. **Inability to update parameters in a timely manner**: Once the parameters of LLMs are trained, it is very difficult to update them to capture the latest changes in the world. 4. **Hallucination problem**: LLMs sometimes generate responses that are inconsistent with the facts, which is known as the "hallucination" phenomenon. To address these challenges, researchers have proposed two main strategies: **knowledge editing** and **retrieval - enhancement**. These methods aim to enhance the capabilities of LLMs by integrating external information. However, there is currently a lack of a comprehensive review of these methods. Therefore, the goal of this paper is to provide a systematic review, discuss the trends in the integration of knowledge and large language models, including method classification, benchmarking, and applications, and conduct in - depth analysis of different methods and point out future research directions.