Convolutional architectures are cortex-aligned de novo

Atlas Kazemian,Eric Elmoznino,Michael F. Bonner

DOI: https://doi.org/10.1101/2024.05.10.593623

2024-05-15

Abstract:What underlies the emergence of cortex-aligned representations in deep neural network models of vision? The success of widely varied architectures has motivated the prevailing hypothesis that large-scale pre-training is the primary factor underlying the similarities between brains and neural networks. Here, we challenge this view by revealing the role of architectural inductive biases in models with minimal training. We examined networks with varied architectures but no pre-training and quantified their ability to predict image representations in the visual cortices of both monkeys and humans. We found that cortex-aligned representations emerge in convolutional architectures that combine two key manipulations of dimensionality: compression in the spatial domain and expansion in the feature domain. We further show that the inductive biases of convolutional architectures are critical for obtaining performance gains from feature expansion - dimensionality manipulations were relatively ineffective in other architectures and in convolutional models with targeted lesions. Our findings suggest that the architectural constraints of convolutional networks are sufficiently close to the constraints of biological vision to allow many aspects of cortical visual representation to emerge even before synaptic connections have been tuned through experience.

Animal Behavior and Cognition

What problem does this paper attempt to address?

This paper discusses how convolutional architectures in deep neural networks (DNNs) can produce representations similar to the visual cortex of the brain without extensive pre-training. The researchers challenged the widely accepted view that extensive pre-training is the main factor in the similarity between DNNs and the brain. They quantified the ability of networks with different architectures but without pre-training to predict image representations in the visual cortex of monkeys and humans. The paper found that convolutional architectures combine two key dimensional operations: spatial domain compression and feature domain expansion, resulting in representations similar to the brain. Further research indicated that the prior bias of convolutional architectures is crucial for utilizing feature expansion, while other architectures and damaged convolutional models had poorer dimensional operations. These findings suggest that even before empirically adjusting synaptic connections, the architectural constraints of convolutional networks are close enough to the constraints of biological vision to evoke many visual representations of the brain. The researchers demonstrated the changes in encoding performance of untrained networks by increasing the number of random features for different architectures such as convolutional, fully-connected, and Transformer, and quantifying their ability to predict image responses in the visual cortex of monkeys and humans. The results showed that although all architectures benefited from dimension expansion, the performance improvement of convolutional architectures was significantly greater than that of other architectures, even in dimension-matched scenarios. Furthermore, the paper revealed the critical role of nonlinear activation functions and spatial locality of convolutional filters in the performance of convolutional networks. The encoding performance of the network significantly decreased when these key components were removed. In summary, this study emphasizes the importance of architectural biases in convolutional networks in forming representations similar to the visual cortex of the brain, even without extensive training. This suggests that although pre-training may be sufficient to induce brain-aligned representations in various architectures, the initial state of convolutional architectures already exhibits a considerable degree of brain alignment.

Convolutional architectures are cortex-aligned de novo

Explaining V1 Properties with a Biologically Constrained Deep Learning Architecture

Convolutional neural networks develop major organizational principles of early visual cortex when enhanced with retinal sampling

Local lateral connectivity is sufficient for replicating cortex-like topographical organization in deep neural networks

A large-scale examination of inductive biases shaping high-level visual representation in brains and machines

Exploring the Architectural Biases of the Canonical Cortical Microcircuit

A Unified Theory of Early Visual Representations from Retina to Cortex through Anatomically Constrained Deep CNNs

High-performing neural network models of visual cortex benefit from high latent dimensionality

Seeing eye-to-eye? A comparison of object recognition performance in humans and deep convolutional neural networks under image manipulation

Complex Properties of Training Stimuli Affect Brain Alignment in a Deep Network Model of Mouse Visual Cortex

Organic Convolution in The Ventral Visual Pathway Can Explain the Variety of Shape Tuning in Area V4

Data-driven emergence of convolutional structure in neural networks

Convolutional neural networks for vision neuroscience: significance, developments, and outstanding issues

Emergence of brain-like mirror-symmetric viewpoint tuning in convolutional neural networks

Shared Architectural Patterns Across the Human Cortical Mantle Predict Visual Representations and Capture Behavior Across the Lifespan

Universal dimensions of visual representation

Comparison Against Task Driven Artificial Neural Networks Reveals Functional Organization of Mouse Visual Cortex

Parsimony, exhaustivity and balanced detection in neocortex

Human Visual Cortex and Deep Convolutional Neural Network Care Deeply about Object Background

From convolutional neural networks to models of higher‐level cognition (and back again)

A Computational Model of Representation Learning in the Brain Cortex, Integrating Unsupervised and Reinforcement Learning