Abstract:Can large language models (LLMs) simulate social surveys? To answer this question, we conducted millions of simulations in which LLMs were asked to answer subjective questions. A comparison of different LLM responses with the European Social Survey (ESS) data suggests that the effect of prompts on bias and variability is fundamental, highlighting major cultural, age, and gender biases. We further discussed statistical methods for measuring the difference between LLM answers and survey data and proposed a novel measure inspired by Jaccard similarity, as LLM-generated responses are likely to have a smaller variance. Our experiments also reveal that it is important to analyze the robustness and variability of prompts before using LLMs to simulate social surveys, as their imitation abilities are approximate at best.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: Can large - language models (LLMs) simulate social surveys? Specifically, through a large number of simulation experiments, the researchers explored the performance of LLMs when answering subjective questions and compared it with the data from the European Social Survey (ESS). The main objectives of the study include: 1. **Compare the answers of LLMs with real - survey data**, especially for data from European countries. 2. **Analyze the effects of different models** and evaluate the impact of prompts on the results. 3. **Put forward suggestions and precautions** so that the actual situation can be more accurately reflected when using LLMs to simulate social surveys. ### Research Background - **Adaptability and Diversity of LLMs**: Although LLMs are often referred to as "stochastic parrots", their adaptability and diversity enable them to perform well in many tasks, similar to chameleons. - **Variables in Social Surveys**: Social surveys usually consider demographic variables such as gender, race, and age, and these variables can also be part of the prompts in LLMs. - **Bias Issues**: There are various biases in LLMs, including cultural, gender, and age biases, which will affect the output of the model. ### Research Methods - **Data Sources**: Use the data from the 10th round of the European Social Survey (ESS), covering 59,686 participants from 31 European countries. - **Model Selection**: Tested models such as ChatGPT (including GPT - 3.5 and GPT - 4), LLaMA - 2, LLaMA - 3, Mistral, and DeepSeek - V2. - **Prompt Design**: Designed a variety of prompts, including basic demographic information (such as year of birth, gender, place of residence) and occupational categories. - **Evaluation Metrics**: Proposed a J - index based on Jaccard similarity to evaluate the distribution consistency between the simulation results of LLMs and the real - survey data. ### Main Findings - **Simulation Ability of LLMs**: LLMs can simulate social survey data well in some cases, but perform poorly in other cases, especially in terms of variance. - **Impact of Prompts**: The design of prompts has a significant impact on the output of LLMs, and different prompts will lead to different results. - **Geographical Imbalance**: There are differences in the simulation effects of different countries. For example, the data simulation effect in Bulgaria is poor. - **Impact of Model Parameters**: Model parameters (such as temperature and top_p) also have a certain impact on the simulation results, but they are not decisive. ### Conclusions - **Overall Evaluation**: Although LLMs can simulate social surveys in some aspects, there is still much room for improvement in their performance, especially in reducing biases and increasing variance. - **Future Directions**: Further research is needed on how to optimize prompt design and model parameters to improve the accuracy of LLMs in social survey simulations. Through these studies, the author hopes to provide valuable references and suggestions for using LLMs in social science research.

Are Large Language Models Chameleons? An Attempt to Simulate Social Surveys

Large Language Models Show Human-like Social Desirability Biases in Survey Responses

Do LLMs exhibit human-like response biases? A case study in survey design

Simulating Field Experiments with Large Language Models

Sense and Sensitivity: Evaluating the simulation of social dynamics via Large Language Models

Social Science Meets LLMs: How Reliable Are Large Language Models in Social Simulations?

Questioning the Survey Responses of Large Language Models

Large Language Models as Subpopulation Representative Models: A Review

Are Large Language Models Consistent over Value-laden Questions?

Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs

Can Large Language Models Transform Computational Social Science?

Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Dialogue

Do Large Language Models Exhibit Cognitive Dissonance? Studying the Difference Between Revealed Beliefs and Stated Answers

You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments

ChatGPT vs Social Surveys: Probing the Objective and Subjective Human Society

Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric Analysis

Exploring Social Desirability Response Bias in Large Language Models: Evidence from GPT-4 Simulations

Modeling Human Subjectivity in LLMs Using Explicit and Implicit Human Factors in Personas

Can Large Language Models Capture Public Opinion about Global Warming? An Empirical Assessment of Algorithmic Fidelity and Bias

Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas

Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study