Preventing and Detecting Misinformation Generated by Large Language Models

Aiwei Liu,Qiang Sheng,Xuming Hu
DOI: https://doi.org/10.1145/3626772.3661377
2024-01-01
Abstract:As large language models (LLMs) become increasingly capable and widely deployed, the risk of them generating misinformation poses a critical challenge. Misinformation from LLMs can take various forms, from factual errors due to hallucination to intentionally deceptive content, and can have severe consequences in high-stakes domains.This tutorial covers comprehensive strategies to prevent and detect misinformation generated by LLMs. We first introduce the types of misinformation LLMs can produce and their root causes. We then explore two broad categories: Preventing misinformation generation: a) AI alignment training techniques to reduce LLMs' propensity for misinformation and refuse malicious instructions during model training. b) Training-free mitigation methods like prompt guardrails, retrieval-augmented generation (RAG), and decoding strategies to curb misinformation at inference time. Detecting misinformation after generation, including a) using LLMs themselves to detect misinformation through embedded knowledge or retrieval-enhanced judgments, and b) distinguishing LLM-generated text from human-written text through black-box approaches (e.g., classifiers, probability analysis) and white-box approaches (e.g., watermarking). We also discuss the challenges and limitations of detecting LLM-generated misinformation.
What problem does this paper attempt to address?