Abstract:Humans are extremely robust in our ability to perceive and recognize objects—we see faces in tea stains and can recognize friends on dark streets. Yet, neurocomputational models of primate object recognition have focused on the initial feed-forward pass of processing through the ventral stream and less on the top-down feedback that likely underlies robust object perception and recognition. Aligned with the generative approach, we propose that the visual system actively facilitates recognition by reconstructing the object hypothesized to be in the image. Top-down attention then uses this reconstruction as a template to bias feedforward processing to align with the most plausible object hypothesis. Building on auto-encoder neural networks, our model makes detailed hypotheses about the appearance and location of the candidate objects in the image by reconstructing a complete object representation from potentially incomplete visual input due to noise and occlusion. The model then leverages the best object reconstruction, measured by reconstruction error, to direct the bottom-up processing of selectively routing low-level features, a top-down biasing that captures a core function of attention. We evaluated our model using the MNIST-C (handwritten digits under corruptions) and ImageNet-C (real-world objects under corruptions) datasets. Not only did our model achieve superior performance on these challenging tasks designed to approximate real-world noise and occlusion viewing conditions, but also better accounted for human behavioral reaction times and error patterns than a standard feedforward Convolutional Neural Network. Our model suggests that a complete understanding of object perception and recognition requires integrating top-down and attention feedback, which we propose is an object reconstruction.

A brain-inspired object-based attention network for multiobject recognition and visual reasoning

Recurrent Attention Models with Object-centric Capsule Representation for Multi-object Recognition

The attentive reconstruction of objects facilitates robust object recognition

Learning to attend in a brain-inspired deep neural network

A Neurodynamical Cortical Model of Visual Attention and Invariant Object Recognition

Object Based Attention Through Internal Gating

Modeling Attention and Binding in the Brain through Bidirectional Recurrent Gating

A Biologically Inspired Visual Working Memory for Deep Networks

Top-down attention based on object representation and incremental memory for knowledge building and inference

Visual Reasoning in Object-Centric Deep Neural Networks: A Comparative Cognition Approach

Reconstruction-guided attention improves the robustness and shape processing of neural networks

The Quest for an Integrated Set of Neural Mechanisms Underlying Object Recognition in Primates

A Neurocomputational Model of Decision and Confidence in Object Recognition Task

Decoding Visual Recognition of Objects from EEG Signals based on Attention-Driven Convolutional Neural Network

Towards A Unified Neural Architecture for Visual Recognition and Reasoning

Attend and Guide (AG-Net): A Keypoints-driven Attention-based Deep Network for Image Recognition

Image Visual Attention Computation and Application Via the Learning of Object Attributes

Finding and Recognizing Objects in Natural Scenes: Complementary Computations in the Dorsal and Ventral Visual Systems

Ventral-Dorsal Neural Networks: Object Detection via Selective Attention

Reasoning About Human-Object Interactions Through Dual Attention Networks