Working Memory Capacity of ChatGPT: An Empirical Study

Dongyu Gong,Xingchen Wan,Dingmin Wang
2024-02-02
Abstract:Working memory is a critical aspect of both human intelligence and artificial intelligence, serving as a workspace for the temporary storage and manipulation of information. In this paper, we systematically assess the working memory capacity of ChatGPT, a large language model developed by OpenAI, by examining its performance in verbal and spatial n-back tasks under various conditions. Our experiments reveal that ChatGPT has a working memory capacity limit strikingly similar to that of humans. Furthermore, we investigate the impact of different instruction strategies on ChatGPT's performance and observe that the fundamental patterns of a capacity limit persist. From our empirical findings, we propose that n-back tasks may serve as tools for benchmarking the working memory capacity of large language models and hold potential for informing future efforts aimed at enhancing AI working memory.
Artificial Intelligence,Computation and Language,Neurons and Cognition
What problem does this paper attempt to address?
The paper aims to explore the working memory capacity of large language models (LLMs) such as ChatGPT and evaluate their performance under different conditions through experiments. Specifically, the researchers designed a series of n-back tasks (including verbal and spatial working memory tasks) to assess ChatGPT's working memory capacity and found that its working memory capacity limitations are similar to those of humans. Additionally, the study investigated the impact of different instruction strategies on ChatGPT's performance, revealing that the basic pattern of working memory capacity limitations remains. Based on these empirical findings, the authors propose that n-back tasks can serve as a benchmark tool for measuring the working memory capacity of large language models and have the potential to enhance artificial intelligence's working memory capabilities in the future. The study also compared the performance of different LLMs on n-back tasks, confirming that the proposed metrics effectively reflect the general capabilities of LLMs. Through this research, the authors discovered that ChatGPT's working memory capacity is limited, and this limitation is similar to human working memory capacity. Even though certain prompting techniques can improve the model's performance, the overall trend remains similar to that of humans. This may reflect a fundamental constraint within the model's architecture, suggesting that ChatGPT's internal working memory mechanisms are somewhat analogous to those of humans.