Abstract:When we perceive a scene, our brain processes various types of visual information simultaneously, ranging from sensory features, such as line orientations and colors, to categorical features, such as objects and their arrangements. Whereas the role of sensory and categorical visual representations in predicting subsequent memory has been studied using isolated objects, their impact on memory for complex scenes remains largely unknown. To address this gap, we conducted an fMRI study in which female and male participants encoded pictures of familiar scenes (e.g., an airport picture) and later recalled them, while rating the vividness of their visual recall. Outside the scanner, participants had to distinguish each seen scene (e.g., an airport picture) from three similar lures (e.g., 3 airport pictures). We modeled the sensory and categorical visual features of multiple scenes using both early and late layers of a deep convolutional neural network. Then, we applied representational similarity analysis to determine which brain regions represented stimuli in accordance with the sensory and categorical models. We found that categorical, but not sensory, representations predicted subsequent memory. In line with the previous result, only for the categorical model, the average recognition performance of each scene exhibited a positive correlation with the average visual dissimilarity between the item in question and its respective lures. These results strongly suggest that even in memory tests that ostensibly rely solely on visual cues (such as force-choice visual recognition with similar distractors), memory decisions for scenes may be primarily influenced by categorical rather than sensory representations. Significance Statement Our memory for real-world scenes often comprises a tableau of complex visual features, but recent findings challenge the view that our memories of such stimuli rely on purely visual information. Instead, it appears that our memory for scenes is heavily influenced by higher-level categorical information. Analyzing cortical representations in regions responsive to both categorical and sensory features, we discovered that only the former can reliably predict memory outcomes. Moreover, the distinctiveness of scenes in terms of their categoric features among similar examples is positively associated with our ability to accurately recognize previously encountered scenes. In essence, this study sheds light on how our brains rely on categorical information to recognize natural scenes.

Feature representations useful for predicting image memorability

Modeling Visual Memorability Assessment with Autoencoders Reveals Characteristics of Memorable Images

Learning Computational Models of Video Memorability from Fmri Brain Imaging

Neural representations of the perception of handwritten digits and visual objects from a convolutional neural network compared to humans

Predicting memorability of face photographs with deep neural networks

Learning Low-Rank Sparse Representations With Robust Relationship Inference for Image Memorability Prediction

Neural Encoding for Image Recall: Human-Like Memory

Image Memorability Prediction Model Based on Low-Rank Representation Learning

AMNet: Memorability Estimation with Attention

Deep Convolutional Neural Networks Outperform Feature-Based But Not Categorical Models in Explaining Object Similarity Judgments

What Images are More Memorable to Machines?

Facial Memorability Prediction Fusing Geometric and Texture Features

Convolutional Neural Networks Exploiting Attributes of Biological Neurons

Media Memorability Prediction Based on Machine Learning

What Makes Natural Scene Memorable?

CNN with large memory layers

Memory for artwork is predictable

Linking Neural Activity to Image Stimuli Through Convolutional Neural Networks: A Methodology

A novel feature-scrambling approach reveals the capacity of convolutional neural networks to learn spatial relations

Dissociable Neural Representations of Adversarially Perturbed Images in Convolutional Neural Networks and the Human Brain

Visual recognition memory of scenes is driven by categorical, not sensory, visual representations.