Concept-Guided Chain-of-Thought Prompting for Pairwise Comparison Scaling of Texts with Large Language Models

Patrick Y. Wu,Jonathan Nagler,Joshua A. Tucker,Solomon Messing
2023-10-18
Abstract:Existing text scaling methods often require a large corpus, struggle with short texts, or require labeled data. We develop a text scaling method that leverages the pattern recognition capabilities of generative large language models (LLMs). Specifically, we propose concept-guided chain-of-thought (CGCoT), which uses prompts designed to summarize ideas and identify target parties in texts to generate concept-specific breakdowns, in many ways similar to guidance for human coder content analysis. CGCoT effectively shifts pairwise text comparisons from a reasoning problem to a pattern recognition problem. We then pairwise compare concept-specific breakdowns using an LLM. We use the results of these pairwise comparisons to estimate a scale using the Bradley-Terry model. We use this approach to scale affective speech on Twitter. Our measures correlate more strongly with human judgments than alternative approaches like Wordfish. Besides a small set of pilot data to develop the CGCoT prompts, our measures require no additional labeled data and produce binary predictions comparable to a RoBERTa-Large model fine-tuned on thousands of human-labeled tweets. We demonstrate how combining substantive knowledge with LLMs can create state-of-the-art measures of abstract concepts.
Computation and Language,Computers and Society
What problem does this paper attempt to address?
The problem this paper attempts to address is the limitations of existing text scaling methods when dealing with short texts, particularly the need for large corpora, difficulty in handling short texts, and the requirement for annotated data. The authors propose a new text scaling method that leverages the pattern recognition capabilities of large language models (LLMs) for pairwise text comparison. Specifically, they developed a method called Concept-Guided Chain-of-Thought (CGCoT), which generates text decompositions of specific concepts by designing a series of questions and uses these decompositions for pairwise comparison. This method not only effectively transforms pairwise text comparison from a reasoning problem into a pattern recognition problem but also uses the Bradley-Terry model to estimate scales. In this way, the paper demonstrates how to combine substantive knowledge with LLMs to create state-of-the-art methods for measuring abstract concepts. Moreover, this method requires only a small amount of pilot data to develop CGCoT prompts and can produce binary prediction results comparable to the RoBERTa-Large model trained on a large amount of manually annotated tweets without needing additional annotated data. Overall, this research demonstrates the significant importance of combining substantive knowledge with LLMs to address text data analysis problems in social sciences.