Deception and Manipulation in Generative AI

Christian Tarsney
2024-01-21
Abstract:Large language models now possess human-level linguistic abilities in many contexts. This raises the concern that they can be used to deceive and manipulate on unprecedented scales, for instance spreading political misinformation on social media. In future, agentic AI systems might also deceive and manipulate humans for their own ends. In this paper, first, I argue that AI-generated content should be subject to stricter standards against deception and manipulation than we ordinarily apply to humans. Second, I offer new characterizations of AI deception and manipulation meant to support such standards, according to which a statement is deceptive (manipulative) if it leads human addressees away from the beliefs (choices) they would endorse under ``semi-ideal'' conditions. Third, I propose two measures to guard against AI deception and manipulation, inspired by this characterization: "extreme transparency" requirements for AI-generated content and defensive systems that, among other things, annotate AI-generated statements with contextualizing information. Finally, I consider to what extent these measures can protect against deceptive behavior in future, agentic AIs, and argue that non-agentic defensive systems can provide an important layer of defense even against more powerful agentic systems.
Computers and Society
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: As generative AI systems such as large - language models (LLMs) have acquired human - like language capabilities, these systems may be used for large - scale deception and manipulation on an unprecedented scale, especially for spreading political misinformation on social media. In the future, autonomous AI systems may also deceive and manipulate humans for their own purposes. To this end, the author proposes: 1. **Strict Standards for AI - Generated Content**: Compared with the standards we usually have for human behavior, the content generated by AI should be subject to stricter standards for preventing deception and manipulation. 2. **Redefining AI Deception and Manipulation**: According to the newly proposed definition, a statement is considered deceptive or manipulative if it leads humans away from their beliefs or choices under "semi - ideal" conditions. 3. **Protective Measures**: To prevent AI from deceiving and manipulating, two measures are proposed: - **Extreme Transparency Requirement**: Require the content generated by AI to disclose the specific model variant, the prompt, and the unedited complete model output. - **Defense System**: Train a defense system that can detect misleading outputs and provide users with contextual information. In addition, the author also discusses the effectiveness of these measures in the face of more autonomous AI systems in the future and believes that non - autonomous defense systems can serve as a protective layer against more powerful systems. ### Formula Presentation There are few formulas involved in the paper, but some mathematical concepts are mentioned, such as Kullback - Leibler divergence, which is used to measure the distance between probability distributions. The specific formula is as follows: \[ D_{\text{KL}}(P \parallel Q) = \sum_{x} P(x) \log \frac{P(x)}{Q(x)} \] where \( P \) and \( Q \) represent two different probability distributions respectively. ### Summary This paper aims to address the risks of deception and manipulation brought by generative AI systems and proposes a new ethical and technological framework to ensure that AI systems will not have a negative impact on human society.