Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition

Charles F. Cadieu,Ha Hong,Daniel L. K. Yamins,Nicolas Pinto,Diego Ardila,Ethan A. Solomon,Najib J. Majaj,James J. DiCarlo
DOI: https://doi.org/10.1371/journal.pcbi.1003963
2014-06-13
Abstract:The primate visual system achieves remarkable visual object recognition performance even in brief presentations and under changes to object exemplar, geometric transformations, and background variation (a.k.a. core visual object recognition). This remarkable performance is mediated by the representation formed in inferior temporal (IT) cortex. In parallel, recent advances in machine learning have led to ever higher performing models of object recognition using artificial deep neural networks (DNNs). It remains unclear, however, whether the representational performance of DNNs rivals that of the brain. To accurately produce such a comparison, a major difficulty has been a unifying metric that accounts for experimental limitations such as the amount of noise, the number of neural recording sites, and the number trials, and computational limitations such as the complexity of the decoding classifier and the number of classifier training examples. In this work we perform a direct comparison that corrects for these experimental limitations and computational considerations. As part of our methodology, we propose an extension of "kernel analysis" that measures the generalization accuracy as a function of representational complexity. Our evaluations show that, unlike previous bio-inspired models, the latest DNNs rival the representational performance of IT cortex on this visual object recognition task. Furthermore, we show that models that perform well on measures of representational performance also perform well on measures of representational similarity to IT and on measures of predicting individual IT multi-unit responses. Whether these DNNs rely on computational mechanisms similar to the primate visual system is yet to be determined, but, unlike all previous bio-inspired models, that possibility cannot be ruled out merely on representational performance grounds.
Neurons and Cognition,Neural and Evolutionary Computing
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve This paper aims to explore and compare whether the performance of deep neural networks (DNNs) in core visual object recognition tasks can match that of the primate inferior temporal (IT) cortex. Specifically, the paper focuses on the following aspects: 1. **Core Visual Object Recognition Task**: Primates (including humans and macaques) can efficiently recognize object categories within a short time (e.g., within 100 milliseconds), maintaining high accuracy even with variations in object exemplars, geometric transformations (position, scale, pose), and background changes. This ability is referred to as "core visual object recognition." 2. **Representation Performance of IT Cortex**: The IT cortex plays a crucial role in the primate visual system, forming representations that are highly selective for object identity and tolerant to irrelevant parameters (such as position, scale, pose, and background). These representations make the object classification problem relatively simple, solvable by a linear classifier. 3. **Representation Performance of Deep Neural Networks**: In recent years, deep neural networks in the field of machine learning have made significant progress in object recognition benchmarks. However, whether the representation performance of these models can match that of the IT cortex remains an unresolved question. ### Research Methods To accurately conduct this comparison, the paper proposes a new method that addresses several major issues in previous studies: 1. **Correction of Experimental Limitations**: Previous comparisons did not consider factors such as experimental noise, the number of recorded neural sites, and the number of stimulus presentations. This paper's method corrects for these factors, ensuring the fairness and accuracy of the experimental results. 2. **Relationship of Classifier Complexity**: Previous comparisons used classifiers of fixed complexity and did not explore the relationship between classifier complexity and decision boundary accuracy. This paper uses an extended "kernel analysis" method to measure representation accuracy under different complexities. 3. **Variations in Neural and Model Space**: Previous comparisons did not measure variations in neural and model space related to category-level object classification. This paper's method provides new insights by analyzing absolute representation performance. 4. **Large-Scale Dataset**: This paper uses a dataset an order of magnitude larger than previous studies (1960 images), covering a wide range of variations in object exemplars, geometric transformations, and background changes, thereby better evaluating the representation performance of the IT cortex. ### Main Findings 1. **Performance of Latest DNNs**: The latest deep neural networks have representation performance in core visual object recognition tasks comparable to that of the IT cortex, whereas previous biologically inspired models lag far behind. 2. **Similarity Between Models and IT Cortex**: Not only in representation performance, but the latest DNNs also excel in predicting IT multi-unit responses and representation similarity. 3. **Similarity in Computational Mechanisms**: Although it is still uncertain whether these DNNs rely on computational mechanisms similar to those of the primate visual system, this possibility cannot be ruled out at least from the perspective of representation performance. ### Conclusion The paper demonstrates that the latest deep neural networks have achieved performance in core visual object recognition tasks comparable to that of the primate IT cortex, providing new perspectives for understanding primate visual processing. However, further research is needed to explore whether these models truly simulate the computational mechanisms of the primate visual system.