Benchmarking Chinese Knowledge Rectification in Large Language Models

Tianhe Lu,Jizhan Fang,Yunzhi Yao,Xin Xu,Ningyu Zhang,Huajun Chen
DOI: https://doi.org/10.48550/arXiv.2409.05806
2024-09-10
Abstract:While Large Language Models (LLMs) exhibit remarkable generative capabilities, they are not without flaws, particularly in the form of hallucinations. This issue is even more pronounced when LLMs are applied to specific languages and domains. For example, LLMs may generate nonsense information when handling Chinese ancient poetry, proverbs, or idioms, owing to the lack of specific knowledge. To this end, this paper introduces a benchmark for rectifying Chinese knowledge in LLMs via knowledge editing. Specifically, we introduce a new Chinese dataset, CKnowEdit, by collecting seven type of knowledge from various sources, including classical texts, idioms, and content from Baidu Tieba Ruozhiba, thereby accounting for the unique polyphony, antithesis, and logical constructs inherent in the Chinese language. Through the analysis of this dataset, we uncover the challenges faced by current LLMs in mastering Chinese. Furthermore, our evaluation of state-of-the-art knowledge editing techniques on this dataset unveil the substantial scope for advancement in the rectification of Chinese knowledge. Code and dataset are available at <a class="link-external link-https" href="https://github.com/zjunlp/EasyEdit" rel="external noopener nofollow">this https URL</a>.
Computation and Language,Artificial Intelligence,Information Retrieval,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the knowledge deficiencies of large language models (LLMs) when dealing with specific languages and domains, especially the problem of incorrect generation in areas such as ancient Chinese poetry, idioms, and proverbs in Chinese. Specifically: 1. **Hallucination phenomenon**: Although LLMs show excellent generation capabilities, they are prone to hallucination (i.e., generating meaningless information) when dealing with specific languages and domains. For example, when dealing with ancient Chinese poetry, idioms, or common sayings, due to a lack of specific knowledge, LLMs may generate incorrect information. 2. **Cultural and linguistic specificity**: Chinese, as a unique language, has its own ideographic characters, phonetic system, and literary forms such as poetry, which form a rich and unique knowledge system. However, existing LLMs perform poorly when dealing with these contents and cannot accurately capture the cultural background and linguistic features of Chinese. 3. **Limitations of existing data sets**: Current knowledge editing methods and data sets mainly focus on English texts, using structured facts such as Wikipedia as the editing basis. These data sets are usually based on translations and cannot fully reflect the uniqueness and cultural connotations of specific languages. To solve these problems, the author proposes to construct a new Chinese data set CKnowEdit, aiming to correct the deficiencies of LLMs in dealing with Chinese through knowledge editing techniques. This data set contains seven types of Chinese - specific knowledge, including ancient poetry, idioms, proverbs, phonetic notations, classical Chinese, geographical knowledge, and "ruo zhi ba" content, ensuring that the data is not only linguistically accurate but also culturally matched. Through the analysis of the CKnowEdit data set, the author reveals the challenges of current LLMs in mastering Chinese and evaluates the effectiveness of existing knowledge editing techniques, finding room for improvement. The ultimate goal is to develop more accurate, coherent, and reliable content generation capabilities to enhance the application effects of LLMs in the Chinese domain.