Unsupervised video forecasting with flow parsing mechanism of human visual system
Beibei Jin,Xiaohui Song,Jindong Li,Pengfei Zhang
DOI: https://doi.org/10.1016/j.engappai.2024.108652
IF: 8
2024-05-29
Engineering Applications of Artificial Intelligence
Abstract:Video forecasting aims to predict future video frames based on past observed video frames, and unlike object recognition or object classification, it does not require manual labeling of the data set. The explosive growth of Internet video data provides a huge space for its development. At present, it has become a research hotspot in the field of computer vision, and has broad application prospects in the field of automatic driving or robot navigation. However, due to the high dimensional characteristics and the complex spatial–temporal logic of video data, current methods still face the challenges of blurry and inconsistent prediction. The cognitive ability of "flow parsing mechanism" helps humans adapt to new situations systematically. Inspired by this, a deep flow parsing network for future video forecasting is proposed in this paper, which is designed to predict future scenes by parsing optical flow into rigid flow and residual flow. The rigid flow represents the scene dynamics due to observer's ego-motion, while the residual flow corresponds to the movement of the other objects in the scene. With this procedure, the model exhibits much more comprehensive understanding over the environment and achieves top performance on competitive driving datasets, demonstrating its effectiveness and generalizability.
automation & control systems,computer science, artificial intelligence,engineering, electrical & electronic, multidisciplinary