Abstract:First, neurophysiological evidence for the learning of invariant representations in the inferior temporal visual cortex is described. This includes object and face representations with invariance for position, size, lighting, view and morphological transforms in the temporal lobe visual cortex; global object motion in the cortex in the superior temporal sulcus; and spatial view representations in the hippocampus that are invariant with respect to eye position, head direction, and place. Second, computational mechanisms that enable the brain to learn these invariant representations are proposed. For the ventral visual system, one key adaptation is the use of information available in the statistics of the environment in slow unsupervised learning to learn transform-invariant representations of objects. This contrasts with deep supervised learning in artificial neural networks, which uses training with thousands of exemplars forced into different categories by neuronal teachers. Similar slow learning principles apply to the learning of global object motion in the dorsal visual system leading to the cortex in the superior temporal sulcus. The learning rule that has been explored in VisNet is an associative rule with a short-term memory trace. The feed-forward architecture has four stages, with convergence from stage to stage. This type of slow learning is implemented in the brain in hierarchically organized competitive neuronal networks with convergence from stage to stage, with only 4-5 stages in the hierarchy. Slow learning is also shown to help the learning of coordinate transforms using gain modulation in the dorsal visual system extending into the parietal cortex and retrosplenial cortex. Representations are learned that are in allocentric spatial view coordinates of locations in the world and that are independent of eye position, head direction, and the place where the individual is located. This enables hippocampal spatial view cells to use idiothetic, self-motion, signals for navigation when the view details are obscured for short periods.

Learning Visual Features Under Motion Invariance

Continual Learning of Conjugated Visual Representations through Higher-order Motion Flows

Learning Features by Watching Objects Move

Invariant global motion recognition in the dorsal visual system: a unifying theory

Invariant-based Mapping of Space During General Motion of an Observer

Learning intermediate-level representations of form and motion from natural movies

A Framework for Learning Invariant Physical Relations in Multimodal Sensory Processing

Learning A Temporally Invariant Representation for Visual Tracking

Modelling Human Visual Motion Processing with Trainable Motion Energy Sensing and a Self-attention Network

Convolutional networks and applications in vision

Learning Features and their Transformations by Spatial and Temporal Spherical Clustering

Learning Transform Invariant Object Recognition in the Visual System with Multiple Stimuli Present During Training.

Invariant Feature Based Automatic Motion Video Registration

Learning Invariant Object and Spatial View Representations in the Brain Using Slow Unsupervised Learning

Learning to see like children: proof of concept

Modeling Complex Motion: Photometric, Geometric, Dynamic, and Topological Aspects

Unsupervised Learning of Invariance Transformations

Achieving view-distance and -angle invariance in motion prediction using a simple network

Perception Of Transformation Invariance In The Visual Pathway

Neural Representations of Dynamic Visual Stimuli

A learning artificial visual system for motion direction detection