Scaling models of visual working memory to natural images

Christopher J Bates,George A Alvarez,Samuel J Gershman
DOI: https://doi.org/10.1038/s44271-023-00048-3
2024-01-03
Abstract:Over the last few decades, psychologists have developed precise quantitative models of human recall performance in visual working memory (VWM) tasks. However, these models are tailored to a particular class of artificial stimulus displays and simple feature reports from participants (e.g., the color or orientation of a simple object). Our work has two aims. The first is to build models that explain people's memory errors in continuous report tasks with natural images. Here, we use image generation algorithms to generate continuously varying response alternatives that differ from the stimulus image in natural and complex ways, in order to capture the richness of people's stored representations. The second aim is to determine whether models that do a good job of explaining memory errors with natural images also explain errors in the more heavily studied domain of artificial displays with simple items. We find that: (i) features taken from state-of-the-art deep encoders predict trial-level difficulty in natural images better than several reasonable baselines; and (ii) the same visual encoders can reproduce set-size effects and response bias curves in the artificial stimulus domains of orientation and color. Moving forward, our approach offers a scalable way to build a more generalized understanding of VWM representations by combining recent advances in both AI and cognitive modeling.
What problem does this paper attempt to address?