To Tell The Truth: Language of Deception and Language Models

Sanchaita Hazra,Bodhisattwa Prasad Majumder
2024-04-08
Abstract:Text-based misinformation permeates online discourses, yet evidence of people's ability to discern truth from such deceptive textual content is scarce. We analyze a novel TV game show data where conversations in a high-stake environment between individuals with conflicting objectives result in lies. We investigate the manifestation of potentially verifiable language cues of deception in the presence of objective truth, a distinguishing feature absent in previous text-based deception datasets. We show that there exists a class of detectors (algorithms) that have similar truth detection performance compared to human subjects, even when the former accesses only the language cues while the latter engages in conversations with complete access to all potential sources of cues (language and audio-visual). Our model, built on a large language model, employs a bottleneck framework to learn discernible cues to determine truth, an act of reasoning in which human subjects often perform poorly, even with incentives. Our model detects novel but accurate language cues in many cases where humans failed to detect deception, opening up the possibility of humans collaborating with algorithms and ameliorating their ability to detect the truth.
Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the ability to distinguish real information from deceptive information through linguistic cues in the text in the absence of other multimodal cues. Specifically, the researchers utilized a unique dataset - the dialogue data from the TV game show "To Tell The Truth". These dialogues occurred between individuals with conflicting goals and in a high - risk environment, leading to the occurrence of lies. The main objectives of the study are: 1. **Are there sufficient linguistic cues** such that real information can be distinguished from deceptive information in the absence of other multimodal cues (such as visual or auditory cues)? 2. **Is there a class of algorithm detectors** that can recognize these linguistic cues and identify the truth through an effective reasoning chain? To achieve the above - mentioned goals, the researchers constructed a model based on a large - language model, using a bottleneck framework to learn distinguishable cues in order to determine the truth. This model not only performs excellently in detecting deception, but also, in many cases, when humans fail to detect deception, the model can successfully detect new and accurate linguistic cues. This provides the possibility for humans to cooperate with algorithms to improve the ability to identify the truth. The dataset "To Tell The Truth from Text" (T4T EXT) used in the paper was extracted from the TV game show in the 1950s and contains objective real information, which is a characteristic not present in existing datasets. By analyzing these dialogues, the researchers discovered some key linguistic cues, such as ambiguity, over - confidence, and half - true statements, which are crucial for detecting deception. In conclusion, this paper aims to improve the ability to detect deception in a pure - text environment by constructing a model that can effectively identify linguistic cues, which is of great significance for understanding deceptive behaviors in human communication and developing more effective deception - detection tools.