AI and the Problem of Knowledge Collapse

Andrew J. Peterson
2024-04-22
Abstract:While artificial intelligence has the potential to process vast amounts of data, generate new insights, and unlock greater productivity, its widespread adoption may entail unforeseen consequences. We identify conditions under which AI, by reducing the cost of access to certain modes of knowledge, can paradoxically harm public understanding. While large language models are trained on vast amounts of diverse data, they naturally generate output towards the 'center' of the distribution. This is generally useful, but widespread reliance on recursive AI systems could lead to a process we define as "knowledge collapse", and argue this could harm innovation and the richness of human understanding and culture. However, unlike AI models that cannot choose what data they are trained on, humans may strategically seek out diverse forms of knowledge if they perceive them to be worthwhile. To investigate this, we provide a simple model in which a community of learners or innovators choose to use traditional methods or to rely on a discounted AI-assisted process and identify conditions under which knowledge collapse occurs. In our default model, a 20% discount on AI-generated content generates public beliefs 2.3 times further from the truth than when there is no discount. An empirical approach to measuring the distribution of LLM outputs is provided in theoretical terms and illustrated through a specific example comparing the diversity of outputs across different models and prompting styles. Finally, based on the results, we consider further research directions to counteract such outcomes.
Artificial Intelligence,Computers and Society
What problem does this paper attempt to address?
The paper discusses the potential issues of artificial intelligence (AI) in knowledge dissemination and introduces the concept of "knowledge collapse". With AI processing large amounts of data and generating insights, its widespread application may lead to unexpected consequences. The author points out that although large-scale language models encounter diverse data during training, their outputs tend to converge towards a "central" distribution, which can result in overreliance on AI systems and a collapse of knowledge, compromising innovation and the richness of human understanding. The paper simulates the scenario of community learners or innovators choosing between traditional methods or relying on AI-assisted processes to analyze the conditions in which knowledge collapse occurs. The research finds that when AI content is discounted, public beliefs may deviate further from the truth. Additionally, the author discusses how humans may counteract this issue by actively filtering information sources and proposes a method to evaluate the diversity of outputs from language models. The paper concludes by suggesting future research directions to prevent knowledge collapse, including addressing biases in AI algorithms, model collapse, and known issues with large language models (LLMs).