A Linguistic Comparison between Human and ChatGPT-Generated Conversations

Morgan Sandler,Hyesun Choung,Arun Ross,Prabu David
2024-04-26
Abstract:This study explores linguistic differences between human and LLM-generated dialogues, using 19.5K dialogues generated by ChatGPT-3.5 as a companion to the EmpathicDialogues dataset. The research employs Linguistic Inquiry and Word Count (LIWC) analysis, comparing ChatGPT-generated conversations with human conversations across 118 linguistic categories. Results show greater variability and authenticity in human dialogues, but ChatGPT excels in categories such as social processes, analytical style, cognition, attentional focus, and positive emotional tone, reinforcing recent findings of LLMs being "more human than human." However, no significant difference was found in positive or negative affect between ChatGPT and human dialogues. Classifier analysis of dialogue embeddings indicates implicit coding of the valence of affect despite no explicit mention of affect in the conversations. The research also contributes a novel, companion ChatGPT-generated dataset of conversations between two independent chatbots, which were designed to replicate a corpus of human conversations available for open access and used widely in AI research on language modeling. Our findings enhance understanding of ChatGPT's linguistic capabilities and inform ongoing efforts to distinguish between human and LLM-generated text, which is critical in detecting AI-generated fakes, misinformation, and disinformation.
Computation and Language,Artificial Intelligence,Computers and Society
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to explore the linguistic differences between human conversations and conversations generated by large language models (LLMs). Specifically, the researchers used 19.5K conversations generated by ChatGPT - 3.5 as a supplement to the EmpathicDialogues dataset and compared the differences between ChatGPT - generated conversations and human conversations in 118 language categories through the Linguistic Inquiry and Word Count (LIWC) analysis method. The research aims to: 1. **Construct a novel conversation dataset**: Create a dataset containing 19.5K conversations generated by two independent ChatGPT chatbots, named 2GPTEmpathicDialogues, as a supplementary resource for the EmpathicDialogues dataset. 2. **Comparison of linguistic features**: Conducted a detailed comparison between human conversations and ChatGPT conversations in 118 language categories, including social behavior, analytical style, cognition, focus of attention, and positive emotional tone, etc. 3. **Implicit analysis of emotional encoding**: Analyzed conversation embeddings through an emotion classifier and explored the implicit encoding of emotional valence in LLM embeddings, even when emotions are not explicitly mentioned in the conversation. The research results show that although human conversations perform better in terms of variability and authenticity, ChatGPT scores higher in social processes, analytical style, cognition, focus of attention, and positive emotional tone, etc. In addition, the study also found that there are no significant differences between ChatGPT and human conversations in positive and negative emotional features. These findings are helpful for understanding the language ability of ChatGPT and provide a new perspective for research on distinguishing between human - generated and LLM - generated texts.