Abstract:Recently, many biologically inspired visual computational models have been proposed. The design of these models follows the related biological mechanisms and structures, and these models provide new solutions for visual recognition tasks. In this paper, based on the recent biological evidence, we propose a framework to mimic the active and dynamic learning and recognition process of the primate visual cortex. From principle point of view, the main contributions are that the framework can achieve unsupervised learning of episodic features (including key components and their spatial relations) and semantic features (semantic descriptions of the key components), which support higher level cognition of an object. From performance point of view, the advantages of the framework are as follows: 1) learning episodic features without supervision-for a class of objects without a prior knowledge, the key components, their spatial relations and cover regions can be learned automatically through a deep neural network (DNN); 2) learning semantic features based on episodic features-within the cover regions of the key components, the semantic geometrical values of these components can be computed based on contour detection; 3) forming the general knowledge of a class of objects-the general knowledge of a class of objects can be formed, mainly including the key components, their spatial relations and average semantic values, which is a concise description of the class; and 4) achieving higher level cognition and dynamic updating-for a test image, the model can achieve classification and subclass semantic descriptions. And the test samples with high confidence are selected to dynamically update the whole model. Experiments are conducted on face images, and a good performance is achieved in each layer of the DNN and the semantic description learning process. Furthermore, the model can be generalized to recognition tasks of other objects with learning ability.

Learning to see like children: proof of concept

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

A Human-in-the-loop Deep Learning Paradigm for Synergic Visual Evaluation in Children

Embodied vision for learning object representations

Visual Learning Beyond Direct Supervision

Ethosight: A Reasoning-Guided Iterative Learning System for Nuanced Perception based on Joint-Embedding & Contextual Label Affinity

Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following

Embodied Learning for Lifelong Visual Perception

Evaluating Continual Learning Algorithms by Generating 3D Virtual Environments

Self-supervised learning of video representations from a child's perspective

Markerless Visual Robot Programming by Demonstration

Computational Baby Learning

Cooperative Learning with Visual Attributes

Distilling Internet-Scale Vision-Language Models into Embodied Agents

Learning from Demonstration with Weakly Supervised Disentanglement

Biologically Inspired Model for Visual Cognition Achieving Unsupervised Episodic and Semantic Feature Learning.

Learning Object Semantic Similarity with Self-Supervision

Learning Visual Features Under Motion Invariance

Towards Label-free Scene Understanding by Vision Foundation Models

Learning visual groups from co-occurrences in space and time