Exploring Human-LLM Conversations: Mental Models and the Originator of Toxicity

Johannes Schneider,Arianna Casanova Flores,Anne-Catherine Kranz
2024-07-08
Abstract:This study explores real-world human interactions with large language models (LLMs) in diverse, unconstrained settings in contrast to most prior research focusing on ethically trimmed models like ChatGPT for specific tasks. We aim to understand the originator of toxicity. Our findings show that although LLMs are rightfully accused of providing toxic content, it is mostly demanded or at least provoked by humans who actively seek such content. Our manual analysis of hundreds of conversations judged as toxic by APIs commercial vendors, also raises questions with respect to current practices of what user requests are refused to answer. Furthermore, we conjecture based on multiple empirical indicators that humans exhibit a change of their mental model, switching from the mindset of interacting with a machine more towards interacting with a human.
Human-Computer Interaction,Artificial Intelligence
What problem does this paper attempt to address?
This paper discusses the issues of human interaction with large language models (LLMs) in unrestricted real-world settings, particularly regarding the source of toxic content. The research found that although LLMs have been accused of generating toxic content, these contents are often actively sought or triggered by human users. Through manual analysis of hundreds of conversations marked as toxic by the API, the paper questions current practices of refusing to respond to certain user requests. Furthermore, the study proposes that humans may change their mental models of LLMs from interacting with machines to interacting more like with humans as the conversation progresses. Key questions addressed in the research include: 1. Do humans use the mental model of machines or humans when interacting with LLMs? 2. How is highly toxic content generated? Is it triggered by users or spontaneously generated by LLMs? Based on analysis of over 200,000 real-world conversations, the paper finds that toxicity is primarily triggered by humans and suggests a possible shift in user mental models. Additionally, the paper discusses the current focus on AI toxicity detection and mitigation and identifies limitations in understanding the sources of toxicity. The research methodology involves analyzing conversation cues such as politeness and language complexity using computational linguistics techniques, as well as utilizing OpenAI's moderation API and manual categorization to identify the generation of toxic content. In conclusion, the paper reveals new phenomena in human-AI interaction and emphasizes the importance of understanding the reasons for the generation of toxic content and the manner in which humans interact with AI. This has significant implications for AI ethics regulation and future societal impact.