Abstract:The primate visual system achieves remarkable visual object recognition performance even in brief presentations and under changes to object exemplar, geometric transformations, and background variation (a.k.a. core visual object recognition). This remarkable performance is mediated by the representation formed in inferior temporal (IT) cortex. In parallel, recent advances in machine learning have led to ever higher performing models of object recognition using artificial deep neural networks (DNNs). It remains unclear, however, whether the representational performance of DNNs rivals that of the brain. To accurately produce such a comparison, a major difficulty has been a unifying metric that accounts for experimental limitations such as the amount of noise, the number of neural recording sites, and the number trials, and computational limitations such as the complexity of the decoding classifier and the number of classifier training examples. In this work we perform a direct comparison that corrects for these experimental limitations and computational considerations. As part of our methodology, we propose an extension of "kernel analysis" that measures the generalization accuracy as a function of representational complexity. Our evaluations show that, unlike previous bio-inspired models, the latest DNNs rival the representational performance of IT cortex on this visual object recognition task. Furthermore, we show that models that perform well on measures of representational performance also perform well on measures of representational similarity to IT and on measures of predicting individual IT multi-unit responses. Whether these DNNs rely on computational mechanisms similar to the primate visual system is yet to be determined, but, unlike all previous bio-inspired models, that possibility cannot be ruled out merely on representational performance grounds.

What problem does this paper attempt to address?

### The Problem the Paper Attempts to Solve This paper aims to explore and compare whether the performance of deep neural networks (DNNs) in core visual object recognition tasks can match that of the primate inferior temporal (IT) cortex. Specifically, the paper focuses on the following aspects: 1. **Core Visual Object Recognition Task**: Primates (including humans and macaques) can efficiently recognize object categories within a short time (e.g., within 100 milliseconds), maintaining high accuracy even with variations in object exemplars, geometric transformations (position, scale, pose), and background changes. This ability is referred to as "core visual object recognition." 2. **Representation Performance of IT Cortex**: The IT cortex plays a crucial role in the primate visual system, forming representations that are highly selective for object identity and tolerant to irrelevant parameters (such as position, scale, pose, and background). These representations make the object classification problem relatively simple, solvable by a linear classifier. 3. **Representation Performance of Deep Neural Networks**: In recent years, deep neural networks in the field of machine learning have made significant progress in object recognition benchmarks. However, whether the representation performance of these models can match that of the IT cortex remains an unresolved question. ### Research Methods To accurately conduct this comparison, the paper proposes a new method that addresses several major issues in previous studies: 1. **Correction of Experimental Limitations**: Previous comparisons did not consider factors such as experimental noise, the number of recorded neural sites, and the number of stimulus presentations. This paper's method corrects for these factors, ensuring the fairness and accuracy of the experimental results. 2. **Relationship of Classifier Complexity**: Previous comparisons used classifiers of fixed complexity and did not explore the relationship between classifier complexity and decision boundary accuracy. This paper uses an extended "kernel analysis" method to measure representation accuracy under different complexities. 3. **Variations in Neural and Model Space**: Previous comparisons did not measure variations in neural and model space related to category-level object classification. This paper's method provides new insights by analyzing absolute representation performance. 4. **Large-Scale Dataset**: This paper uses a dataset an order of magnitude larger than previous studies (1960 images), covering a wide range of variations in object exemplars, geometric transformations, and background changes, thereby better evaluating the representation performance of the IT cortex. ### Main Findings 1. **Performance of Latest DNNs**: The latest deep neural networks have representation performance in core visual object recognition tasks comparable to that of the IT cortex, whereas previous biologically inspired models lag far behind. 2. **Similarity Between Models and IT Cortex**: Not only in representation performance, but the latest DNNs also excel in predicting IT multi-unit responses and representation similarity. 3. **Similarity in Computational Mechanisms**: Although it is still uncertain whether these DNNs rely on computational mechanisms similar to those of the primate visual system, this possibility cannot be ruled out at least from the perspective of representation performance. ### Conclusion The paper demonstrates that the latest deep neural networks have achieved performance in core visual object recognition tasks comparable to that of the primate IT cortex, providing new perspectives for understanding primate visual processing. However, further research is needed to explore whether these models truly simulate the computational mechanisms of the primate visual system.

Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition

Seeing eye-to-eye? A comparison of object recognition performance in humans and deep convolutional neural networks under image manipulation

The Neural Representation Benchmark and its Evaluation on Brain and Machine

The Quest for an Integrated Set of Neural Mechanisms Underlying Object Recognition in Primates

Explaining face representation in the primate brain using different computational models

Improved object recognition using neural networks trained to mimic the brain's statistical properties

Performance-optimized deep neural networks are evolving into worse models of inferotemporal visual cortex

Deep Neural Networks predict Hierarchical Spatio-temporal Cortical Dynamics of Human Visual Object Recognition

Comparison Against Task Driven Artificial Neural Networks Reveals Functional Organization of Mouse Visual Cortex

Brain-inspired Models for Visual Object Recognition: an Overview

Probing neural representations of scene perception in a hippocampally dependent task using artificial neural networks

Comparing object recognition in humans and deep convolutional neural networks -- An eye tracking study

Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like?

How well do models of visual cortex generalize to out of distribution samples?

Equivalent processing of facial expression and identity by macaque visual system and task-optimized neural network

Examining Representational Similarity in ConvNets and the Primate Visual Cortex

Deep Spiking Neural Networks with High Representation Similarity Model Visual Pathways of Macaque and Mouse

Deep Neural Networks and Visuo-Semantic Models Explain Complementary Components of Human Ventral-Stream Representational Dynamics

Deep Reinforcement Learning Models Predict Visual Responses in the Brain: A Preliminary Result

Decoding Neural Responses in Mouse Visual Cortex through a Deep Neural Network

Deep neural networks: a new framework for modelling biological vision and brain information processing