Abstract:Recognizing faces regardless of their viewpoint is critical for social interactions. Traditional theories hold that view-selective early visual representations gradually become tolerant to viewpoint changes along the ventral visual hierarchy. Newer theories, based on single-neuron monkey electrophysiological recordings, suggest a three-stage architecture including an intermediate face-selective patch abruptly achieving invariance to mirror-symmetric face views. Human studies combining neuroimaging and multivariate pattern analysis (MVPA) have provided convergent evidence of view-selectivity in early visual areas. However, contradictory conclusions have been reached concerning the existence in humans of a mirror-symmetric representation like that observed in macaques. We believe these contradictions arise from low-level stimulus confounds and data analysis choices. To probe for low-level confounds, we analyzed images from two face databases. Analyses of image luminance and contrast revealed biases across face views described by even polynomials—i.e., mirror-symmetric. To explain major trends across neuroimaging studies, we constructed a network model incorporating three constraints: cortical magnification, convergent feedforward projections, and interhemispheric connections. Given the identified low-level biases, we show that a gradual increase of interhemispheric connections across network-layers is sufficient to replicate view-tuning in early processing stages and mirror-symmetry in later stages. Data analysis decisions—pattern dissimilarity measure and data recentering—accounted for the inconsistent observation of mirror-symmetry across prior studies. Pattern analyses of human fMRI data (of either sex) revealed biases compatible with our model. The model provides a unifying explanation of MVPA studies of viewpoint selectivity, and suggests observations of mirror-symmetry in humans originate from ineffectively normalized signal imbalances across different face-views. Significance Statement The recognition of identity regardless of viewpoint is critical for social interactions. In primates, the representation of mirror-symmetric face-views is thought to be a key intermediate processing step leading from strictly view-tuned to viewpoint-invariant representations. Human neuroimaging studies, however, have reached contradictory conclusions regarding the representation of viewpoint information in face-selective areas, despite being concordant in early visual areas. We show that low-level stimulus confounds and data-analysis choices explain these contradictory observations. We propose a network model that replicates observations of view-tuning in early processing stages regardless of analysis choices. The variable observation of mirror-symmetry in later stages is explained by choice of pattern dissimilarity measure and data recentering. Analyses of fMRI data confirmed biases broadly compatible with our model.

Emergence of brain-like mirror-symmetric viewpoint tuning in convolutional neural networks

A unifying model for discordant and concordant results in human neuroimaging studies of facial viewpoint selectivity

Seeing eye-to-eye? A comparison of object recognition performance in humans and deep convolutional neural networks under image manipulation

Invariant Object Recognition in the Visual System with Novel Views of 3D Objects

Bipartite invariance in mouse primary visual cortex

Concurrent emergence of view invariance, sensitivity to critical features, and identity face classification through visual experience: Insights from deep learning algorithms

Humans and deep networks largely agree on which kinds of variation make object recognition harder

Improved object recognition using neural networks trained to mimic the brain's statistical properties

A Unified Theory of Early Visual Representations from Retina to Cortex through Anatomically Constrained Deep CNNs

Building of Object View Invariance in a Newly-Discovered Network in Inferior Temporal Cortex

Complex Properties of Training Stimuli Affect Brain Alignment in a Deep Network Model of Mouse Visual Cortex

Efficient Rotation Invariance in Deep Neural Networks through Artificial Mental Rotation

Parallel development of object recognition in newborn chicks and deep neural networks

Learning Transform Invariant Object Recognition in the Visual System with Multiple Stimuli Present During Training.

Learning Invariant Object and Spatial View Representations in the Brain Using Slow Unsupervised Learning

A large-scale examination of inductive biases shaping high-level visual representation in brains and machines

Invariant Visual Object and Face Recognition: Neural and Computational Bases, and a Model, VisNet

Convolutional Neural Networks: A Binocular Vision Perspective

Asymmetric stimulus representations bias visual perceptual learning

Invariant face and object recognition in the visual system.

Convolutional architectures are cortex-aligned de novo