Manipulating and measuring variation in deep neural network (DNN) representations of objects

Jason K Chow,Thomas J Palmeri
DOI: https://doi.org/10.1016/j.cognition.2024.105920
IF: 4.011
Cognition
Abstract:We explore how DNNs can be used to develop a computational understanding of individual differences in high-level visual cognition given their ability to generate rich meaningful object representations informed by their architecture, experience, and training protocols. As a first step to quantifying individual differences in DNN representations, we systematically explored the robustness of a variety of representational similarity measures: Representational Similarity Analysis (RSA), Centered Kernel Alignment (CKA), and Projection-Weighted Canonical Correlation Analysis (PWCCA), with an eye to how these measures are used in cognitive science, cognitive neuroscience, and vision science. To manipulate object representations, we next created a large set of models varying in random initial weights and random training image order, training image frequencies, training category frequencies, and model size and architecture and measured the representational variation caused by each manipulation. We examined both small (All-CNN-C) and commonly-used large (VGG and ResNet) DNN architectures. To provide a comparison for the magnitude of representational differences, we established a baseline based on the representational variation caused by image-augmentation techniques used to train those DNNs. We found that variation in model randomization and model size never exceeded baseline. By contrast, differences in training image frequency and training category frequencies caused representational variation that exceeded baseline, with training category frequency manipulations exceeding baseline earlier in the networks. These findings provide insights into the magnitude of representational variations that can be expected with a range of manipulations and provide a springboard for further exploration of systematic model variations aimed at modeling individual differences in high-level visual cognition.
What problem does this paper attempt to address?