Abstract:In the rapidly advancing realms of computer vision and artificial intelligence, the quest for human-like intelligence is escalating. Central to this pursuit is visual perception, with the human eye as a paragon of efficiency in the natural world. Recent research has prominently embraced the emulation of the human eye's visual system in computer vision. This paper introduces a pioneering approach, the visually-aware biomimetic network (VBNet), composed of a dual-branch parallel architecture: a Transformer branch emulating the peripheral retina for global feature dependencies and a CNN branch resembling the macular region for local details. Furthermore, it employs feature converter modules (CFC and TFC) to enhance information fusion between the branches. Empirical results highlight VBNet's superiority over RegNet and PVT in ImageNet classification and competitive performance in MSCOCO object detection and instance segmentation. The dual-branch design, akin to the human visual system, enables simultaneous focus on local and global features, offering fresh perspectives for future research in the field of computer vision and artificial intelligence.

What problem does this paper attempt to address?

The paper aims to address the issue of better simulating the human visual system in the field of computer vision. Specifically, the researchers propose a new architecture called "Visual Perception Bionic Network" (VBNet), which combines the advantages of Convolutional Neural Networks (CNN) and Transformers to simultaneously capture local details and global contextual information in images. ### Main Issues 1. **Fusion of Local and Global Features**: Traditional CNNs mainly focus on local features when processing images but lack effective global contextual information; whereas Transformers emphasize global features but are insufficient in handling local details. 2. **Improving Performance in Image Classification, Object Detection, and Instance Segmentation**: By designing a new dual-branch parallel structure, VBNet aims to enhance performance in image classification tasks and also be competitive in tasks such as object detection and instance segmentation. ### Solutions - **Dual-Branch Parallel Structure**: VBNet adopts a dual-branch structure that includes a CNN branch and a Transformer branch, where the CNN branch is responsible for capturing local details of the image, and the Transformer branch is used to extract global features. - **Feature Conversion Modules**: To better fuse the features between CNN and Transformer, the study introduces two feature conversion modules—Convolutional Feature Converter (CFC) and Transformer Feature Converter (TFC), allowing the effective fusion of the two different types of features. - **Hierarchical Structure Design**: VBNet employs a hierarchical design approach, processing input data in multiple stages, with each stage comprising several VBNet blocks, and the channel dimension gradually increasing as the stages progress. Through the above design, VBNet not only performs well on the ImageNet classification task but also achieves competitive results in object detection and instance segmentation tasks on the MSCOCO dataset. This indicates that the model can effectively utilize both local and global information in various computer vision tasks, thereby enhancing overall performance.

VBNet: A Visually-Aware Biomimetic Network for Simulating the Human Eye's Visual System

Towards Human-Leveled Vision Systems

A retina-inspired neurocomputing circuit for image representation

Seeing eye-to-eye? A comparison of object recognition performance in humans and deep convolutional neural networks under image manipulation

CVSNet: A Computer Implementation for Central Visual System of The Brain

A Biologically Inspired Neurocomputing Circuit for Image Representation

Achieving More Human Brain-Like Vision via Human EEG Representational Alignment

Energy-Efficient Visual Search by Eye Movement and Low-Latency Spiking Neural Network

Unidirectional brain-computer interface: Artificial neural network encoding natural images to fMRI response in the visual cortex

A Bio-inspired Model for Image Representation and Image Analysis

Local Image Descriptor Inspired by Visual Cortex

A Visual Perceiving and Eyeball-Motion Controlling Neural Network for Object Searching and Locating

Exploring the Brain-like Properties of Deep Neural Networks: A Neural Encoding Perspective

Deep neural networks: a new framework for modelling biological vision and brain information processing

A Novel Biologically Inspired Visual Cognition Model: Automatic Extraction of Semantics, Formation of Integrated Concepts, and Reselection Features for Ambiguity

A Multi-Layers Neural Network Model Based On Characteristics Of Ganglions' Receptive Fields In Retina And An Algorithm For Watchfulness-Keeping

A neural network based on biological vision learning and its application on robot

ViR: Towards Efficient Vision Retention Backbones

Hierarchies in Visual Pathway: Functions and Inspired Artificial Vision

Artificial Visual Network with Fully Modeled Retinal Direction-Selective Neural Pathway for Motion Direction Detection in Grayscale Scenes

Human Eyes-Inspired Recurrent Neural Networks Are More Robust Against Adversarial Noises