VBNet: A Visually-Aware Biomimetic Network for Simulating the Human Eye's Visual System

Zhaofei Li,Yufan Mao,Mingshan Zhong,Jun Zhao
DOI: https://doi.org/10.1080/08839514.2024.2335100
IF: 2.777
2024-04-03
Applied Artificial Intelligence
Abstract:In the rapidly advancing realms of computer vision and artificial intelligence, the quest for human-like intelligence is escalating. Central to this pursuit is visual perception, with the human eye as a paragon of efficiency in the natural world. Recent research has prominently embraced the emulation of the human eye's visual system in computer vision. This paper introduces a pioneering approach, the visually-aware biomimetic network (VBNet), composed of a dual-branch parallel architecture: a Transformer branch emulating the peripheral retina for global feature dependencies and a CNN branch resembling the macular region for local details. Furthermore, it employs feature converter modules (CFC and TFC) to enhance information fusion between the branches. Empirical results highlight VBNet's superiority over RegNet and PVT in ImageNet classification and competitive performance in MSCOCO object detection and instance segmentation. The dual-branch design, akin to the human visual system, enables simultaneous focus on local and global features, offering fresh perspectives for future research in the field of computer vision and artificial intelligence.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?
The paper aims to address the issue of better simulating the human visual system in the field of computer vision. Specifically, the researchers propose a new architecture called "Visual Perception Bionic Network" (VBNet), which combines the advantages of Convolutional Neural Networks (CNN) and Transformers to simultaneously capture local details and global contextual information in images. ### Main Issues 1. **Fusion of Local and Global Features**: Traditional CNNs mainly focus on local features when processing images but lack effective global contextual information; whereas Transformers emphasize global features but are insufficient in handling local details. 2. **Improving Performance in Image Classification, Object Detection, and Instance Segmentation**: By designing a new dual-branch parallel structure, VBNet aims to enhance performance in image classification tasks and also be competitive in tasks such as object detection and instance segmentation. ### Solutions - **Dual-Branch Parallel Structure**: VBNet adopts a dual-branch structure that includes a CNN branch and a Transformer branch, where the CNN branch is responsible for capturing local details of the image, and the Transformer branch is used to extract global features. - **Feature Conversion Modules**: To better fuse the features between CNN and Transformer, the study introduces two feature conversion modules—Convolutional Feature Converter (CFC) and Transformer Feature Converter (TFC), allowing the effective fusion of the two different types of features. - **Hierarchical Structure Design**: VBNet employs a hierarchical design approach, processing input data in multiple stages, with each stage comprising several VBNet blocks, and the channel dimension gradually increasing as the stages progress. Through the above design, VBNet not only performs well on the ImageNet classification task but also achieves competitive results in object detection and instance segmentation tasks on the MSCOCO dataset. This indicates that the model can effectively utilize both local and global information in various computer vision tasks, thereby enhancing overall performance.