Abstract:Cultural bias is pervasive in many large language models (LLMs), largely due to the deficiency of data representative of different cultures. Typically, cultural datasets and benchmarks are constructed either by extracting subsets of existing datasets or by aggregating from platforms such as Wikipedia and social media. However, these approaches are highly dependent on real-world data and human annotations, making them costly and difficult to scale. Inspired by cognitive theories on social communication, this paper introduces CulturePark, an LLM-powered multi-agent communication framework for cultural data collection. CulturePark simulates cross-cultural human communication with LLM-based agents playing roles in different cultures. It generates high-quality cross-cultural dialogues encapsulating human beliefs, norms, and customs. Using CulturePark, we generated 41,000 cultural samples to fine-tune eight culture-specific LLMs. We evaluated these models across three downstream tasks: content moderation, cultural alignment, and cultural education. Results show that for content moderation, our GPT-3.5-based models either match or outperform GPT-4 on datasets. Regarding cultural alignment, our models surpass GPT-4 on Hofstede's VSM 13 framework. Furthermore, for cultural education of human participants, our models demonstrate superior outcomes in both learning efficacy and user experience compared to GPT-4. CulturePark proves an important step in addressing cultural bias and advancing the democratization of AI, highlighting the critical role of culturally inclusive data in model training. Code is released at <a class="link-external link-https" href="https://github.com/Scarelette/CulturePark" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

This paper attempts to address the issue of cultural bias in large - language models (LLMs). Specifically, current LLMs tend to be biased towards mainstream cultures in their training data, while neglecting other cultures. This leads to an incomplete understanding and reflection of different cultures by the models, and may even exacerbate social conflicts. The paper points out that the main reason for this cultural bias is that the training corpora of LLMs are mainly dominated by English data expressing Western values and viewpoints, while data from other cultures are relatively scarce, that is, in a low - resource state. To meet this challenge, the paper introduces CulturePark, an LLM - based multi - agent communication framework for simulating cross - cultural communication. CulturePark generates high - quality cross - cultural dialogue datasets by allowing agents from different cultures to have multi - round conversations. These datasets contain information such as human beliefs, norms, and customs. Using these datasets, researchers can further fine - tune LLMs for specific cultures to improve the performance of the models in downstream tasks such as content moderation, cultural alignment, and cultural education. Using 41,000 cultural samples generated by CulturePark, the researchers fine - tuned eight LLMs for specific cultures and evaluated them in three downstream tasks: content moderation, cultural alignment, and cultural education. The experimental results show that in terms of content moderation, the GPT - 3.5 - based model performs equally well as or better than GPT - 4 on 41 datasets; in terms of cultural alignment, the model outperforms GPT - 4 in Hofstede's cultural dimension theory; in terms of cultural education, the model is also superior to GPT - 4 in learning efficiency and user experience. In conclusion, CulturePark provides a cost - effective and effective solution aimed at reducing cultural bias in LLMs and promoting cross - cultural understanding and communication by generating diverse cultural data.

CulturePark: Boosting Cross-cultural Understanding in Large Language Models

CultureLLM: Incorporating Cultural Differences into Large Language Models

Cultural Alignment in Large Language Models: An Explanatory Analysis Based on Hofstede's Cultural Dimensions

CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting

Cultural Bias and Cultural Alignment of Large Language Models

CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies

CDEval: A Benchmark for Measuring the Cultural Dimensions of Large Language Models

Self-Pluralising Culture Alignment for Large Language Models

CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs

CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge

Cultural Fidelity in Large-Language Models: An Evaluation of Online Language Resources as a Driver of Model Performance in Value Representation

Assessing Cross-Cultural Alignment between ChatGPT and Human Societies: An Empirical Study

From Bytes to Biases: Investigating the Cultural Self-Perception of Large Language Models

Crossroads of Continents: Automated Artifact Extraction for Cultural Adaptation with Large Multimodal Models

Investigating Cultural Alignment of Large Language Models

Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking

Enhancing Content Moderation with Culturally-Aware Models

Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language Models

Methodology of Adapting Large English Language Models for Specific Cultural Contexts

Exploring Visual Culture Awareness in GPT-4V: A Comprehensive Probing

The Cultural Psychology of Large Language Models: Is ChatGPT a Holistic or Analytic Thinker?