Abstract:Video predictive understanding encompasses a wide range of efforts that are concerned with the anticipation of the unobserved future from the current as well as historical video observations. Action prediction is a major sub-area of video predictive understanding and is the focus of this review. This sub-area has two major subdivisions: early action recognition and future action prediction. Early action recognition is concerned with recognizing an ongoing action as soon as possible. Future action prediction is concerned with the anticipation of actions that follow those previously observed. In either case, the \textbf{\textit{causal}} relationship between the past, current, and potential future information is the main focus. Various mathematical tools such as Markov Chains, Gaussian Processes, Auto-Regressive modeling, and Bayesian recursive filtering are widely adopted jointly with computer vision techniques for these two tasks. However, these approaches face challenges such as the curse of dimensionality, poor generalization, and constraints from domain-specific knowledge. Recently, structures that rely on deep convolutional neural networks and recurrent neural networks have been extensively proposed for improving the performance of existing vision tasks, in general, and action prediction tasks, in particular. However, they have their own shortcomings, \eg reliance on massive training data and lack of strong theoretical underpinnings. In this survey, we start by introducing the major sub-areas of the broad area of video predictive understanding, which recently have received intensive attention and proven to have practical value. Next, a thorough review of various early action recognition and future action prediction algorithms are provided with suitably organized divisions. Finally, we conclude our discussion with future research directions.

A review of video action recognition based on 3D convolution

A review of Convolutional-Neural-Network-based action recognition

A Review of Deep Learning Based Video Action Recognition Techniques

A Comprehensive Study of Deep Video Action Recognition

Human Action Recognition Using Deep Learning Methods.

Video action recognition: A survey

3D Convolutional Neural Network for Action Recognition.

DC3D: A Video Action Recognition Network Based on Dense Connection

Recent Progress in Appearance-based Action Recognition

A Survey on Backbones for Deep Video Action Recognition

3D-TDC: A 3D temporal dilation convolution framework for video action recognition

A Comprehensive Review of Recent Deep Learning Techniques for Human Activity Recognition

Action recognition method based on a novel keyframe extraction method and enhanced 3D convolutional neural network

3D Action Recognition Using Data Visualization and Convolutional Neural Networks.

Deep Neural Networks in Video Human Action Recognition: A Review

3-Stream Convolutional Networks for Video Action Recognition with Hybrid Motion Field

Short-Term Action Recognition by 3D Convolutional Neural Network with Pixel-Wise Evidences

RGB-D Based Action Recognition with Light-weight 3D Convolutional Networks

Action Recognition By Learning Deep Multi-Granular Spatio-Temporal Video Representation

Spatiotemporal Multimodal Learning With 3D CNNs for Video Action Recognition

Review of Video Predictive Understanding: Early Action Recognition and Future Action Prediction