A Sentence is Worth a Thousand Pictures: Can Large Language Models Understand Hum4n L4ngu4ge and the W0rld behind W0rds?

Evelina Leivada,Gary Marcus,Fritz Günther,Elliot Murphy

2024-09-04

Abstract:Modern Artificial Intelligence applications show great potential for language-related tasks that rely on next-word prediction. The current generation of Large Language Models (LLMs) have been linked to claims about human-like linguistic performance and their applications are hailed both as a step towards artificial general intelligence and as a major advance in understanding the cognitive, and even neural basis of human language. To assess these claims, first we analyze the contribution of LLMs as theoretically informative representations of a target cognitive system vs. atheoretical mechanistic tools. Second, we evaluate the models' ability to see the bigger picture, through top-down feedback from higher levels of processing, which requires grounding in previous expectations and past world experience. We hypothesize that since models lack grounded cognition, they cannot take advantage of these features and instead solely rely on fixed associations between represented words and word vectors. To assess this, we designed and ran a novel 'leet task' (l33t t4sk), which requires decoding sentences in which letters are systematically replaced by numbers. The results suggest that humans excel in this task whereas models struggle, confirming our hypothesis. We interpret the results by identifying the key abilities that are still missing from the current state of development of these models, which require solutions that go beyond increased system scaling.

Computation and Language

What problem does this paper attempt to address?

The problem this paper attempts to address is whether large language models (LLMs) possess language understanding and cognitive abilities similar to humans. Specifically, the paper evaluates whether LLMs can decode sentences with letter substitutions using top-down feedback in high-level processing, akin to humans, by designing a novel "leet language task." The study finds that although some LLMs perform close to or even surpass humans in certain tasks, humans significantly outperform LLMs in decoding leet language. This indicates that current LLMs lack deep understanding based on real-world experience and expectations. Therefore, the paper explores the differences in language understanding between LLMs and humans and suggests directions for future improvements.

A Sentence is Worth a Thousand Pictures: Can Large Language Models Understand Hum4n L4ngu4ge and the W0rld behind W0rds?

Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve

Large Language Models Demonstrate the Potential of Statistical Learning in Language

How to Measure the Intelligence of Large Language Models?

Embers of autoregression show how large language models are shaped by the problem they are trained to solve

Large Language Model Displays Emergent Ability to Interpret Novel Literary Metaphors

Testing AI on language comprehension tasks reveals insensitivity to underlying meaning

Language in Vivo vs. in Silico: Size Matters but Larger Language Models Still Do Not Comprehend Language on a Par with Humans

Dissociating language and thought in large language models: a cognitive perspective

Large Language Models and the Reverse Turing Test

The Importance of Understanding Language in Large Language Models

Humanlike Cognitive Patterns as Emergent Phenomena in Large Language Models

Can large language models understand uncommon meanings of common words?

Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning

Symbols and grounding in large language models

Large Language Models: The Need for Nuance in Current Debates and a Pragmatic Perspective on Understanding

LLMs' Understanding of Natural Language Revealed

Large Linguistic Models: Analyzing theoretical linguistic abilities of LLMs

On the Unexpected Abilities of Large Language Models

Large Language Models Lack Understanding of Character Composition of Words

Large Language Models Are In-Context Semantic Reasoners Rather Than Symbolic Reasoners