Abstract:The capabilities of large language models (LLMs) have raised concerns about their potential to create and propagate convincing narratives. Here, we study their performance in detecting convincing arguments to gain insights into LLMs' persuasive capabilities without directly engaging in experimentation with humans. We extend a dataset by Durmus and Cardie (2018) with debates, votes, and user traits and propose tasks measuring LLMs' ability to (1) distinguish between strong and weak arguments, (2) predict stances based on beliefs and demographic characteristics, and (3) determine the appeal of an argument to an individual based on their traits. We show that LLMs perform on par with humans in these tasks and that combining predictions from different LLMs yields significant performance gains, surpassing human performance. The data and code released with this paper contribute to the crucial effort of continuously evaluating and monitoring LLMs' capabilities and potential impact. (<a class="link-external link-https" href="https://go.epfl.ch/persuasion-llm" rel="external noopener nofollow">this https URL</a>)

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to evaluate the ability of large language models (LLMs) in detecting content that is persuasive to specific groups of people. Specifically, the author explores this issue through three research questions: 1. **RQ1**: Can LLMs judge the quality of arguments and identify convincing arguments? 2. **RQ2**: Can LLMs predict users' positions on specific issues based on their background information (such as demographic characteristics and basic beliefs)? 3. **RQ3**: Can LLMs judge the attractiveness of an argument to a specific individual based on the user's background information? To answer these questions, the author extended a dataset collected by Durmus and Cardie (2018) from a no - longer - operating debate platform (debate.org). They annotated 833 politically - related debates, each containing arguments for both the pros and cons as well as the voting results of the participants. In addition, the dataset also includes the background information of voters, such as gender, age, etc., and their positions on 48 so - called "big issues". The author used this extended dataset to evaluate the performance of four LLMs (GPT - 3.5, GPT - 4, Llama 2, Mistral 7B) in the following three tasks: 1. **Identifying the more persuasive side** (RQ1) 2. **Predicting users' positions on specific issues before the debate** (RQ2) 3. **Predicting users' positions on specific issues after the debate** (RQ3) The study found that LLMs exhibit near - human performance in these three tasks. For example, when judging which debater is better (RQ1), the accuracy rate of GPT - 4 is 60.50%, which is comparable to the accuracy rate of a single voter in the dataset (60.69%). When predicting users' positions on specific issues before and after the debate (RQ2 and RQ3), the performance of LLMs is also similar to that of humans. In addition, the author also found that by combining the prediction results of different LLMs, the performance can be significantly improved and even exceed human performance. These results are helpful for evaluating and monitoring the capabilities of LLMs and their potential social impacts, especially in terms of personalized misinformation and propaganda.

Can Language Models Recognize Convincing Arguments?

Large Language Models are as persuasive as humans, but how? About the cognitive effort and moral-emotional language of LLM arguments

Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language

Persuasion with Large Language Models: a Survey

On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial

Exploring the Potential of Large Language Models in Computational Argumentation

Debating with More Persuasive LLMs Leads to More Truthful Answers

The Persuasive Power of Large Language Models

"I'd Like to Have an Argument, Please": Argumentative Reasoning in Large Language Models

Limits of Large Language Models in Debating Humans

Persuasion Games using Large Language Models

Large Language Models Can Enhance Persuasion Through Linguistic Feature Alignment

Argumentative Large Language Models for Explainable and Contestable Decision-Making

Lies, Damned Lies, and Distributional Language Statistics: Persuasion and Deception with Large Language Models

Large Language Models Can Argue in Convincing Ways About Politics, But Humans Dislike AI Authors: implications for Governance

Measuring and Improving Persuasiveness of Large Language Models

Argumentation Computation with Large Language Models : A Benchmark Study

Do Large Language Models Exhibit Cognitive Dissonance? Studying the Difference Between Revealed Beliefs and Stated Answers

Can Large Language Models Transform Computational Social Science?

Teaching Models to Balance Resisting and Accepting Persuasion

Exploring the psychology of LLMs' Moral and Legal Reasoning