Abstract:“Number sense”, the ability to quickly estimate quantities of objects in a visual scene, is present in humans and many other animals, and has recently been demonstrated in biologically inspired vision models, even before training. However, real-world number perception requires abstraction from the properties of individual objects and their contexts, in contrast to the simplified dot patterns used in previous studies. Using novel, synthetically generated photorealistic stimuli, we discovered that deep convolutional neural networks optimized for object recognition can encode numerical information across varying object and scene identities in their distributed activity patterns. In contrast, untrained networks failed to discriminate numbers, and appeared to encode low-level visual summary statistics of scenes rather than the number of discrete objects per se. These results caution against using untrained networks to model early numerical abilities and highlight the need to use more complex stimuli to understand the mechanisms behind the brain’s visual number sense.

What problem does this paper attempt to address?

The paper attempts to address the problem of how to encode the quantity information of objects in visual scenes within deep convolutional neural networks (HCNNs), and whether this encoding can remain independent of changes in the identity of objects and scenes. Specifically, the study tests whether these networks can maintain their ability to encode quantity information despite changes in objects and backgrounds by using synthetic realistic image stimuli. Additionally, the paper explores the performance differences between trained and untrained networks in this regard and further analyzes the role of low-level image statistical features in quantity encoding. The study finds that networks trained for object recognition can encode quantity information in a way that maintains a certain level of independence when objects and backgrounds change, whereas untrained networks cannot achieve this. Furthermore, the research indicates that quantity information is not solely carried by artificial neurons sensitive to simple dot patterns but is distributed across a broader population of neurons. By simplifying the design of stimuli, the study also validates the importance of complex stimuli in revealing the differences between trained and untrained networks.

Trained deep neural network models of the ventral visual pathway encode numerosity with robustness to object and scene identity