Cultural Commonsense Knowledge for Intercultural Dialogues

Tuan-Phong Nguyen,Simon Razniewski,Gerhard Weikum
DOI: https://doi.org/10.1145/3627673.3679768
2024-07-23
Abstract:Despite recent progress, large language models (LLMs) still face the challenge of appropriately reacting to the intricacies of social and cultural conventions. This paper presents MANGO, a methodology for distilling high-accuracy, high-recall assertions of cultural knowledge. We judiciously and iteratively prompt LLMs for this purpose from two entry points, concepts and cultures. Outputs are consolidated via clustering and generative summarization. Running the MANGO method with GPT-3.5 as underlying LLM yields 167K high-accuracy assertions for 30K concepts and 11K cultures, surpassing prior resources by a large margin in quality and size. In an extrinsic evaluation for intercultural dialogues, we explore augmenting dialogue systems with cultural knowledge assertions. Notably, despite LLMs inherently possessing cultural knowledge, we find that adding knowledge from MANGO improves the overall quality, specificity, and cultural sensitivity of dialogue responses, as judged by human annotators. Data and code are available for download.
Computation and Language
What problem does this paper attempt to address?
The problem this paper attempts to address is the challenge that large language models (LLMs) still face in handling the complexities of social and cultural conventions, especially in cross-cultural communication. To improve this situation, the paper proposes the Mango method, which aims to extract high-precision, high-recall cultural knowledge assertions from LLMs. Specifically, Mango generates these assertions by carefully and iteratively prompting LLMs from both conceptual and cultural entry points, and integrates the output through clustering and generative summarization. The goal of the paper is to enhance the quality, specificity, and cultural sensitivity of dialogue systems in cross-cultural communication. The main contributions of the paper include: 1. Proposing the Mango method, which efficiently extracts cultural commonsense knowledge (CCSK) from LLMs with high precision and high recall. 2. Using GPT-3.5 to run the Mango method, generating 167,000 high-quality assertions covering 30,000 concepts and 11,000 cultures, far exceeding the quality and scale of existing resources. 3. In external evaluations of cross-cultural communication, injecting Mango assertions significantly improved the specificity and cultural sensitivity of dialogue responses. Overall, the paper aims to address the current deficiency of cultural knowledge in large language models in cross-cultural communication through the Mango method, thereby enhancing the performance of dialogue systems.