BertaQA: How Much Do Language Models Know About Local Culture?

Julen Etxaniz,Gorka Azkune,Aitor Soroa,Oier Lopez de Lacalle,Mikel Artetxe
2024-06-11
Abstract:Large Language Models (LLMs) exhibit extensive knowledge about the world, but most evaluations have been limited to global or anglocentric subjects. This raises the question of how well these models perform on topics relevant to other cultures, whose presence on the web is not that prominent. To address this gap, we introduce BertaQA, a multiple-choice trivia dataset that is parallel in English and Basque. The dataset consists of a local subset with questions pertinent to the Basque culture, and a global subset with questions of broader interest. We find that state-of-the-art LLMs struggle with local cultural knowledge, even as they excel on global topics. However, we show that continued pre-training in Basque significantly improves the models' performance on Basque culture, even when queried in English. To our knowledge, this is the first solid evidence of knowledge transfer from a low-resource to a high-resource language. Our analysis sheds light on the complex interplay between language and knowledge, and reveals that some prior findings do not fully hold when reassessed on local topics. Our dataset and evaluation code are available under open licenses at <a class="link-external link-https" href="https://github.com/juletx/BertaQA" rel="external noopener nofollow">this https URL</a>.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the performance issues of large language models (LLMs) when dealing with specific cultural knowledge. Although existing large language models perform excellently on a wide range of tasks and most evaluations focus on English or global topics, their ability to handle other culturally relevant topics has not been fully assessed. To address this issue, the authors introduce a dataset called BERTA QA, which consists of two parts: one part contains local questions about Basque culture, and the other part contains broader global questions. By comparing the models' performance on these two types of questions, the study finds that current advanced models, while performing well on global issues, struggle with local cultural knowledge. Additionally, the paper explores the impact of continued pre-training using Basque corpora on model performance. The results show that this approach can significantly improve the model's performance on Basque culture-related questions, even when the queries are in English. This indicates that knowledge of low-resource languages can be transferred to high-resource languages through pre-training, challenging the traditional view that increasing multilingualism would harm English performance. Finally, the paper compares the effectiveness of the translate-test and self-translate methods, finding that these methods are less effective for local issues than for global ones. In summary, the study emphasizes the importance of considering both local and global issues when evaluating large language models.