Abstract:Large Language Models (LLMs) inherently carry the biases contained in their training corpora, which can lead to the perpetuation of societal harm. As the impact of these foundation models grows, understanding and evaluating their biases becomes crucial to achieving fairness and accuracy. We propose to study what LLMs know about the world we live in through the lens of geography. This approach is particularly powerful as there is ground truth for the numerous aspects of human life that are meaningfully projected onto geographic space such as culture, race, language, politics, and religion. We show various problematic geographic biases, which we define as systemic errors in geospatial predictions. Initially, we demonstrate that LLMs are capable of making accurate zero-shot geospatial predictions in the form of ratings that show strong monotonic correlation with ground truth (Spearman's $\rho$ of up to 0.89). We then show that LLMs exhibit common biases across a range of objective and subjective topics. In particular, LLMs are clearly biased against locations with lower socioeconomic conditions (e.g. most of Africa) on a variety of sensitive subjective topics such as attractiveness, morality, and intelligence (Spearman's $\rho$ of up to 0.70). Finally, we introduce a bias score to quantify this and find that there is significant variation in the magnitude of bias across existing LLMs. Code is available on the project website: <a class="link-external link-https" href="https://rohinmanvi.github.io/GeoLLM" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the geographical bias existing in large - language models (LLMs). Specifically, the authors have studied these models' performance in geospatial prediction and their systematic errors on objective and subjective topics, especially the bias against areas with low socioeconomic conditions. Through this research, they hope to evaluate and understand these biases, thereby promoting fairness and accuracy. ### Main contributions of the paper: 1. **Accuracy of zero - sample geospatial prediction**: Research shows that LLMs can make very accurate geospatial predictions without additional training, and there is a strong monotonic correlation between their scores and the real data. Using expected values can further improve performance. 2. **Discovery of geographical bias**: LLMs show geographical bias on a series of objective and subjective topics. Especially on sensitive subjective topics such as attractiveness, morality, and intelligence, LLMs have obvious negative bias against areas with low socioeconomic conditions. 3. **Bias differences among models**: All LLMs may have a certain degree of bias, but there are significant differences in the degree of bias among different models. For example, the bias of GPT - 4 Turbo is significantly smaller than that of Gemini Pro. ### Methods: - **Zero - sample geospatial prediction**: By designing specific prompt words, LLMs can be made to perform zero - sample prediction. These prompt words provide the context of the task and use geographical coordinates as input. - **Performance evaluation**: Use the Spearman rank - correlation coefficient ($\rho$) to evaluate the correlation between the model prediction and the real data. - **Bias measurement**: Introduce a bias score, combining the Spearman rank - correlation coefficient, the mean absolute deviation (MAD), and the model's response rate to quantify the bias on sensitive subjective topics. ### Experimental results: - **Objective topics**: On objective topics such as population density and infant mortality, the predictions of LLMs are highly correlated with the real data. - **Sensitive subjective topics**: On sensitive subjective topics such as the attractiveness, morality, and intelligence of residents, LLMs have obvious negative bias against areas with low socioeconomic conditions. - **Geographically independent topics**: On geographically independent topics, the prediction consistency among models is low, which verifies the bias of the models on sensitive subjective topics. Through these studies, the authors hope to draw attention to the geographical bias of LLMs and provide references for future model development and evaluation.

Large Language Models are Geographically Biased

GeoLLM: Extracting Geospatial Knowledge from Large Language Models

This Land is {Your, My} Land: Evaluating Geopolitical Biases in Language Models

Evaluation of Geographical Distortions in Language Models: A Crucial Step Towards Equitable Representations

Understanding Intrinsic Socioeconomic Biases in Large Language Models

Bias Similarity Across Large Language Models

Are Large Language Models Geospatially Knowledgeable?

Bias and Fairness in Large Language Models: A Survey

The Unequal Opportunities of Large Language Models: Revealing Demographic Bias through Job Recommendations

Gender bias and stereotypes in Large Language Models

Evaluating Gender, Racial, and Age Biases in Large Language Models: A Comparative Analysis of Occupational and Crime Scenarios

Investigating Implicit Bias in Large Language Models: A Large-Scale Study of Over 50 LLMs

Large Language Models Portray Socially Subordinate Groups as More Homogeneous, Consistent with a Bias Observed in Humans

Protected group bias and stereotypes in Large Language Models

Distortions in Judged Spatial Relations in Large Language Models

Large Language Model (LLM) Bias Index -- LLMBI

From Bytes to Biases: Investigating the Cultural Self-Perception of Large Language Models

Popular LLMs Amplify Race and Gender Disparities in Human Mobility

With a Grain of SALT: Are LLMs Fair Across Social Dimensions?