AI and the Problem of Knowledge Collapse

Andrew J. Peterson

2024-04-22

Abstract:While artificial intelligence has the potential to process vast amounts of data, generate new insights, and unlock greater productivity, its widespread adoption may entail unforeseen consequences. We identify conditions under which AI, by reducing the cost of access to certain modes of knowledge, can paradoxically harm public understanding. While large language models are trained on vast amounts of diverse data, they naturally generate output towards the 'center' of the distribution. This is generally useful, but widespread reliance on recursive AI systems could lead to a process we define as "knowledge collapse", and argue this could harm innovation and the richness of human understanding and culture. However, unlike AI models that cannot choose what data they are trained on, humans may strategically seek out diverse forms of knowledge if they perceive them to be worthwhile. To investigate this, we provide a simple model in which a community of learners or innovators choose to use traditional methods or to rely on a discounted AI-assisted process and identify conditions under which knowledge collapse occurs. In our default model, a 20% discount on AI-generated content generates public beliefs 2.3 times further from the truth than when there is no discount. An empirical approach to measuring the distribution of LLM outputs is provided in theoretical terms and illustrated through a specific example comparing the diversity of outputs across different models and prompting styles. Finally, based on the results, we consider further research directions to counteract such outcomes.

Artificial Intelligence,Computers and Society

What problem does this paper attempt to address?

The paper discusses the potential issues of artificial intelligence (AI) in knowledge dissemination and introduces the concept of "knowledge collapse". With AI processing large amounts of data and generating insights, its widespread application may lead to unexpected consequences. The author points out that although large-scale language models encounter diverse data during training, their outputs tend to converge towards a "central" distribution, which can result in overreliance on AI systems and a collapse of knowledge, compromising innovation and the richness of human understanding. The paper simulates the scenario of community learners or innovators choosing between traditional methods or relying on AI-assisted processes to analyze the conditions in which knowledge collapse occurs. The research finds that when AI content is discounted, public beliefs may deviate further from the truth. Additionally, the author discusses how humans may counteract this issue by actively filtering information sources and proposes a method to evaluate the diversity of outputs from language models. The paper concludes by suggesting future research directions to prevent knowledge collapse, including addressing biases in AI algorithms, model collapse, and known issues with large language models (LLMs).

AI and the Problem of Knowledge Collapse

Artificial intelligence and illusions of understanding in scientific research

Amplifying Limitations, Harms and Risks of Large Language Models

Position: Stop Making Unscientific AGI Performance Claims

A narrowing of AI research?

From Protoscience to Epistemic Monoculture: How Benchmarking Set the Stage for the Deep Learning Revolution

Extinction Risks from AI: Invisible to Science?

Generative AI: An Existential Threat to Human Content Creators?

Problems in AI, their roots in philosophy, and implications for science and society

The Return of Pseudosciences in Artificial Intelligence: Have Machine Learning and Deep Learning Forgotten Lessons from Statistics and History?

How artificial intelligence affects education?

Synthetic Lies: Understanding AI-Generated Misinformation and Evaluating Algorithmic and Human Solutions

Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World

Misalignments in AI Perception: Quantitative Findings and Visual Mapping of How Experts and the Public Differ in Expectations and Risks, Benefits, and Value Judgments

When not to use machine learning: A perspective on potential and limitations

The rapid competitive economy of machine learning development: a discussion on the social risks and benefits

Human-AI Interactions and Societal Pitfalls

Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality

Social Evolution of Published Text and The Emergence of Artificial Intelligence Through Large Language Models and The Problem of Toxicity and Bias

Bias Amplification in Artificial Intelligence Systems

The artificial intelligence revolution...in unethical publishing: Will AI worsen our dysfunctional publishing system?