Abstract:We introduce dynamic predictive coding, a hierarchical model of spatiotemporal prediction and sequence learning in the neocortex. The model assumes that higher cortical levels modulate the temporal dynamics of lower levels, correcting their predictions of dynamics using prediction errors. As a result, lower levels form representations that encode sequences at shorter timescales (e.g., a single step) while higher levels form representations that encode sequences at longer timescales (e.g., an entire sequence). We tested this model using a two-level neural network, where the top-down modulation creates low-dimensional combinations of a set of learned temporal dynamics to explain input sequences. When trained on natural videos, the lower-level model neurons developed space-time receptive fields similar to those of simple cells in the primary visual cortex while the higher-level responses spanned longer timescales, mimicking temporal response hierarchies in the cortex. Additionally, the network's hierarchical sequence representation exhibited both predictive and postdictive effects resembling those observed in visual motion processing in humans (e.g., in the flash-lag illusion). When coupled with an associative memory emulating the role of the hippocampus, the model allowed episodic memories to be stored and retrieved, supporting cue-triggered recall of an input sequence similar to activity recall in the visual cortex. When extended to three hierarchical levels, the model learned progressively more abstract temporal representations along the hierarchy. Taken together, our results suggest that cortical processing and learning of sequences can be interpreted as dynamic predictive coding based on a hierarchical spatiotemporal generative model of the visual world. The brain is adept at predicting stimuli and events at multiple timescales. How do the neuronal networks in the brain achieve this remarkable capability? We propose that the neocortex employs dynamic predictive coding to learn hierarchical spatiotemporal representations. Using computer simulations, we show that when exposed to natural videos, a hierarchical neural network that minimizes prediction errors develops stable and longer timescale responses at the higher level; lower-level neurons learn space-time receptive fields similar to the receptive fields of primary visual cortical cells. The same network also exhibits several effects in visual motion processing and supports cue-triggered activity recall. Our results provide a new framework for understanding the genesis of temporal response hierarchies and activity recall in the neocortex.

Predictive Coding for Dynamic Vision : Development of Functional Hierarchy in a Multiple Spatio-Temporal Scales RNN Model

Recognition of Visually Perceived Compositional Human Actions by Multiple Spatio-Temporal Scales Recurrent Neural Networks

Dynamic predictive coding: A model of hierarchical sequence learning and prediction in the neocortex

Multiple Timescale Recurrent Neural Network with Slow Feature Analysis for Efficient Motion Recognition.

Dynamic Spatiotemporal Pattern Recognition with Recurrent Spiking Neural Network

Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for Video Prediction

Predictive Coding Networks for Temporal Prediction

PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning

DMS-GCN: Dynamic Mutiscale Spatiotemporal Graph Convolutional Networks for Human Motion Prediction

Multiscale Spatial and Temporal Learning for Human Motion Prediction

Active Predictive Coding: A Unified Neural Framework for Learning Hierarchical World Models for Perception and Planning

Predicting Long-Term Skeletal Motions by a Spatio-Temporal Hierarchical Recurrent Network.

A Spatio-Temporal Transformer Network for Human Motion Prediction in Human-Robot Collaboration

Real-time human action classification using a dynamic neural model.

PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs

Enhanced Spatial–temporal Dynamics in Pose Forecasting Through Multi-Graph Convolution Networks

Dynamic Environment Prediction in Urban Scenes using Recurrent Representation Learning

Continuous Motion Recognition Using Multiple Time Constant Recurrent Neural Network With A Deep Network Model

TrajectoryCNN: A New Spatio-Temporal Feature Learning Network for Human Motion Prediction

Multitask Non-Autoregressive Model For Human Motion Prediction