Metacognitive Monitoring: A Human Ability Beyond Generative Artificial Intelligence

Markus Huff,Elanur Ulakçı

2024-10-17

Abstract:Large language models (LLMs) have shown impressive alignment with human cognitive processes, raising questions about the extent of their similarity to human cognition. This study investigates whether LLMs, specifically ChatGPT, possess metacognitive monitoring abilities akin to humans-particularly in predicting memory performance on an item-by-item basis. We employed a cross-agent prediction model to compare the metacognitive performance of humans and ChatGPT in a language-based memory task involving garden-path sentences preceded by either fitting or unfitting context sentences. Both humans and ChatGPT rated the memorability of these sentences; humans then completed a surprise recognition memory test. Our findings reveal a significant positive relationship between humans' memorability ratings and their actual recognition performance, indicating reliable metacognitive monitoring. In contrast, ChatGPT did not exhibit a similar predictive capability. Bootstrapping analyses demonstrated that none of the GPT models tested (GPT-3.5-turbo, GPT-4-turbo, GPT-4o) could accurately predict human memory performance on a per-item basis. This suggests that, despite their advanced language processing abilities and alignment with human cognition at the object level, current LLMs lack the metacognitive mechanisms that enable humans to anticipate their memory performance. These results highlight a fundamental difference between human and AI cognition at the metacognitive level. Addressing this gap is crucial for developing AI systems capable of effective self-monitoring and adaptation to human needs, thereby enhancing human-AI interactions across domains such as education and personalized learning.

Computation and Language

What problem does this paper attempt to address?

The problem this paper attempts to address is whether large language models (LLMs) possess metacognitive monitoring abilities similar to humans, particularly in predicting memory performance. Specifically, the researchers conducted an experiment to explore whether ChatGPT (an LLM) can predict the memory performance of specific items like humans. The experimental design involved having both humans and ChatGPT rate the relatedness and memorability of a series of sentence pairs, followed by an unexpected memory test for the humans. The results showed a significant positive correlation between humans' memorability ratings and their actual memory performance, indicating that humans have reliable metacognitive monitoring abilities. However, ChatGPT failed to demonstrate similar predictive abilities, even across different versions of the GPT model. This study reveals a fundamental difference between humans and AI at the metacognitive level, highlighting the importance of developing AI systems with effective self-monitoring and adaptive capabilities to better meet human needs and enhance human-computer interaction.

Metacognitive Monitoring: A Human Ability Beyond Generative Artificial Intelligence

Towards a Psychology of Machines: Large Language Models Predict Human Memory

Generative AI as a metacognitive agent: A comparative mixed-method study with human participants on ICF-mimicking exam performance

Efficiently Measuring the Cognitive Ability of LLMs: an Adaptive Testing Perspective

MoT: Memory-of-Thought Enables ChatGPT to Self-Improve

Toward Autonomy: Metacognitive Learning for Enhanced AI Performance

LLM Cognitive Judgements Differ From Human

Challenging large language models' " intelligence" with human tools: A neuropsychological investigation in Italian language on prefrontal functioning

Cognitive Effects in Large Language Models

The Cognitive Capabilities of Generative AI: A Comparative Analysis with Human Benchmarks

Do large language models show decision heuristics similar to humans? A case study using GPT-3.5.

Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT

Thinking Fast and Slow in Large Language Models

Mind meets machine: Unravelling GPT-4's cognitive psychology

Testing theory of mind in large language models and humans

Working Memory Capacity of ChatGPT: An Empirical Study

Using cognitive psychology to understand GPT-3

Large Language Model Displays Emergent Ability to Interpret Novel Literary Metaphors

Human-like problem-solving abilities in large language models using ChatGPT

Humanlike Cognitive Patterns as Emergent Phenomena in Large Language Models

Does Using ChatGPT Result in Human Cognitive Augmentation?