A Sentence is Worth a Thousand Pictures: Can Large Language Models Understand Hum4n L4ngu4ge and the W0rld behind W0rds?

Evelina Leivada,Gary Marcus,Fritz Günther,Elliot Murphy
2024-09-04
Abstract:Modern Artificial Intelligence applications show great potential for language-related tasks that rely on next-word prediction. The current generation of Large Language Models (LLMs) have been linked to claims about human-like linguistic performance and their applications are hailed both as a step towards artificial general intelligence and as a major advance in understanding the cognitive, and even neural basis of human language. To assess these claims, first we analyze the contribution of LLMs as theoretically informative representations of a target cognitive system vs. atheoretical mechanistic tools. Second, we evaluate the models' ability to see the bigger picture, through top-down feedback from higher levels of processing, which requires grounding in previous expectations and past world experience. We hypothesize that since models lack grounded cognition, they cannot take advantage of these features and instead solely rely on fixed associations between represented words and word vectors. To assess this, we designed and ran a novel 'leet task' (l33t t4sk), which requires decoding sentences in which letters are systematically replaced by numbers. The results suggest that humans excel in this task whereas models struggle, confirming our hypothesis. We interpret the results by identifying the key abilities that are still missing from the current state of development of these models, which require solutions that go beyond increased system scaling.
Computation and Language
What problem does this paper attempt to address?
The problem this paper attempts to address is whether large language models (LLMs) possess language understanding and cognitive abilities similar to humans. Specifically, the paper evaluates whether LLMs can decode sentences with letter substitutions using top-down feedback in high-level processing, akin to humans, by designing a novel "leet language task." The study finds that although some LLMs perform close to or even surpass humans in certain tasks, humans significantly outperform LLMs in decoding leet language. This indicates that current LLMs lack deep understanding based on real-world experience and expectations. Therefore, the paper explores the differences in language understanding between LLMs and humans and suggests directions for future improvements.