Recurrent issues with deep neural networks of visual recognition

Timothée Maniquet,Hans Op de Beeck,Andrea Ivan Costantino

DOI: https://doi.org/10.1101/2024.04.02.587669

2024-10-11

Abstract:Object recognition requires flexible and robust information processing, especially in view of the challenges posed by naturalistic visual settings. The ventral stream in visual cortex is provided with this robustness by its recurrent connectivity. Recurrent deep neural networks (DNNs) have recently emerged as promising models of the ventral stream. In this study, we asked whether DNNs could be used to explore the role of different recurrent computations during challenging visual recognition. We assembled a stimulus set that included manipulations that are often associated with recurrent processing in the literature, like occlusion, partial viewing, clutter, and spatial phase scrambling. We obtained a benchmark dataset from human participants performing a categorisation task on this stimulus set. By applying a wide range of model architectures to the same task, we uncovered a nuanced relationship between recurrence, model size, and performance. While recurrent models reach higher performance than their feedforward counterpart, we could not dissociate this improvement from that obtained by increasing model size. We found consistency between humans and models patterns of difficulty across the visual manipulations, but this was not modulated in an obvious way by the specific type of recurrence or size added to the model. Finally, depth/size rather than recurrence makes model confusion patterns more human-like. Contrary to previous assumptions, our findings challenge the notion that recurrent models are better models of human recognition behaviour than feedforward models, and emphasise the complexity of incorporating recurrence into computational models.

Neuroscience

What problem does this paper attempt to address?

The paper aims to explore the performance of Recurrent Deep Neural Networks (RNNs) in visual recognition tasks and compare them with Feedforward Deep Neural Networks (FNNs). Specifically, the researchers hope to verify through a series of experiments whether recurrent networks can better simulate human behavior when faced with complex visual scenes. They designed a set of stimuli that includes various image processing techniques (such as occlusion, partial views, cluttered backgrounds, and spatial phase scrambling) and collected data on human subjects performing classification tasks under these conditions. By extensively testing the performance of different model architectures (including various recurrent and non-recurrent structures) on the same tasks, the researchers aim to reveal the complex relationship between recurrent connections, model scale, and performance. The results show that although recurrent models outperform their feedforward counterparts, this improvement seems to be more attributable to the increase in model scale rather than the recurrent mechanism itself. Additionally, the consistency of error patterns between humans and models did not significantly change due to specific types of recurrent connections or model scale. Ultimately, the study found that the depth/scale of the model, rather than its recurrence, made the model's confusion patterns more similar to those of humans. This suggests that recurrent models do not necessarily reflect human recognition behavior more accurately than feedforward models, and it also highlights the complexity of incorporating recurrent mechanisms into computational models. Therefore, the research suggests that future work needs to develop more complex and biologically plausible methods for implementing recurrent mechanisms.

Recurrent issues with deep neural networks of visual recognition

Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior

Long-Term Recurrent Convolutional Networks for Visual Recognition and Description

Task-Driven Convolutional Recurrent Models of the Visual System

Mechanisms of human dynamic object recognition revealed by sequential deep neural networks

Recurrent Feedback Improves Recognition of Partially Occluded Objects

Characterising representation dynamics in recurrent neural networks for object recognition

Leveraging the Human Ventral Visual Stream to Improve Neural Network Robustness

Seeing eye-to-eye? A comparison of object recognition performance in humans and deep convolutional neural networks under image manipulation

Deep Neural Networks predict Hierarchical Spatio-temporal Cortical Dynamics of Human Visual Object Recognition

How well do models of visual cortex generalize to out of distribution samples?

Deep Neural Networks and Visuo-Semantic Models Explain Complementary Components of Human Ventral-Stream Representational Dynamics

Recurrent networks improve neural response prediction and provide insights into underlying cortical circuits

Recurrent Models of Visual Attention

Human Eyes Inspired Recurrent Neural Networks are More Robust Against Adversarial Noises

Performance-optimized deep neural networks are evolving into worse models of inferotemporal visual cortex

Humans and deep networks largely agree on which kinds of variation make object recognition harder

Human Eyes-Inspired Recurrent Neural Networks Are More Robust Against Adversarial Noises

Recurrent connections facilitate occluded object recognition by explaining-away

Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition

Recurrent Feedback Improves Feedforward Representations in Deep Neural Networks