Metacognitive Monitoring: A Human Ability Beyond Generative Artificial Intelligence

Markus Huff,Elanur Ulakçı
2024-10-17
Abstract:Large language models (LLMs) have shown impressive alignment with human cognitive processes, raising questions about the extent of their similarity to human cognition. This study investigates whether LLMs, specifically ChatGPT, possess metacognitive monitoring abilities akin to humans-particularly in predicting memory performance on an item-by-item basis. We employed a cross-agent prediction model to compare the metacognitive performance of humans and ChatGPT in a language-based memory task involving garden-path sentences preceded by either fitting or unfitting context sentences. Both humans and ChatGPT rated the memorability of these sentences; humans then completed a surprise recognition memory test. Our findings reveal a significant positive relationship between humans' memorability ratings and their actual recognition performance, indicating reliable metacognitive monitoring. In contrast, ChatGPT did not exhibit a similar predictive capability. Bootstrapping analyses demonstrated that none of the GPT models tested (GPT-3.5-turbo, GPT-4-turbo, GPT-4o) could accurately predict human memory performance on a per-item basis. This suggests that, despite their advanced language processing abilities and alignment with human cognition at the object level, current LLMs lack the metacognitive mechanisms that enable humans to anticipate their memory performance. These results highlight a fundamental difference between human and AI cognition at the metacognitive level. Addressing this gap is crucial for developing AI systems capable of effective self-monitoring and adaptation to human needs, thereby enhancing human-AI interactions across domains such as education and personalized learning.
Computation and Language
What problem does this paper attempt to address?
The problem this paper attempts to address is whether large language models (LLMs) possess metacognitive monitoring abilities similar to humans, particularly in predicting memory performance. Specifically, the researchers conducted an experiment to explore whether ChatGPT (an LLM) can predict the memory performance of specific items like humans. The experimental design involved having both humans and ChatGPT rate the relatedness and memorability of a series of sentence pairs, followed by an unexpected memory test for the humans. The results showed a significant positive correlation between humans' memorability ratings and their actual memory performance, indicating that humans have reliable metacognitive monitoring abilities. However, ChatGPT failed to demonstrate similar predictive abilities, even across different versions of the GPT model. This study reveals a fundamental difference between humans and AI at the metacognitive level, highlighting the importance of developing AI systems with effective self-monitoring and adaptive capabilities to better meet human needs and enhance human-computer interaction.