Abstract:Large Language Models (LLMs) have transformed artificial intelligence by excelling in complex natural language processing tasks. Their ability to generate human-like text has opened new possibilities for market research, particularly in conjoint analysis, where understanding consumer preferences is essential but often resource-intensive. Traditional survey-based methods face limitations in scalability and cost, making LLM-generated data a promising alternative. However, while LLMs have the potential to simulate real consumer behavior, recent studies highlight a significant gap between LLM-generated and human data, with biases introduced when substituting between the two. In this paper, we address this gap by proposing a novel statistical data augmentation approach that efficiently integrates LLM-generated data with real data in conjoint analysis. Our method leverages transfer learning principles to debias the LLM-generated data using a small amount of human data. This results in statistically robust estimators with consistent and asymptotically normal properties, in contrast to naive approaches that simply substitute human data with LLM-generated data, which can exacerbate bias. We validate our framework through an empirical study on COVID-19 vaccine preferences, demonstrating its superior ability to reduce estimation error and save data and costs by 24.9\% to 79.8\%. In contrast, naive approaches fail to save data due to the inherent biases in LLM-generated data compared to human data. Another empirical study on sports car choices validates the robustness of our results. Our findings suggest that while LLM-generated data is not a direct substitute for human responses, it can serve as a valuable complement when used within a robust statistical framework.

Leveraging Large Language Models to Democratize Access to Costly Financial Datasets for Academic Research

FinGPT: Democratizing Internet-scale Data for Financial Large Language Models

FinGPT: Open-Source Financial Large Language Models

Large Language Models as Financial Data Annotators: A Study on Effectiveness and Efficiency

A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law

Auto-Generating Earnings Report Analysis via a Financial-Augmented LLM

Large Language Models for Market Research: A Data-augmentation Approach

The Adoption and Efficacy of Large Language Models: Evidence From Consumer Complaints in the Financial Industry

LLeMpower: Understanding Disparities in the Control and Access of Large Language Models

What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions

A Data-Centric Approach for Financial Large Language Models with Abductive Augmentation Reasoning

LLMs as Research Tools: A Large Scale Survey of Researchers' Usage and Perceptions

Data-Centric Financial Large Language Models

GPT-InvestAR: Enhancing Stock Investment Strategies through Annual Report Analysis with Large Language Models

Leveraging LLMs for KPIs Retrieval from Hybrid Long-Document: A Comprehensive Framework and Dataset.

Making LLMs Worth Every Penny: Resource-Limited Text Classification in Banking

A Large Language Model-based Approach for Analyzing Covariates of Health Equity in Registered Research Projects

Revolutionizing Finance with LLMs: An Overview of Applications and Insights