How Deep Neural Networks Understand Motion? Toward Interpretable Motion Modeling by Leveraging the Relative Change in Position

Hehe Fan,Tao Zhuo,Xiaoyu Feng,Guoshun Nan
DOI: https://doi.org/10.34133/icomputing.0008
2023-01-01
Abstract:Motion understanding plays an important role in video-based cross-media analysis and multiple knowledge representation learning. This paper discusses physical motion recognition and prediction by deep neural networks (DNNs), such as convolutional neural networks and recurrent neural networks. In physics, motion is the relative change in position with respect to time. To ablate the moving object and the background where the motion happens, we focus on an ideal scenario where a point moves in a plane. As the first contribution, we evaluate a few popular DNN architectures from video research on the relative position change modeling. Experiment results and conclusions can be insightful in action recognition and video prediction. As the second contribution, we propose a vector network (VecNet) to model the relative change in position. VecNet considers the motion in a short interval as a vector. Meanwhile, VecNet can move a point to the corresponding position given a vector representation. To obtain the representation of the motion for a long time, we use a long short-term memory (LSTM) to aggregate or predict vector representations over time. The resulting VecNet+LSTM approach is able to effectively support both recognition and prediction, proving that modeling relative position change is necessary for motion recognition and makes motion prediction easier.
What problem does this paper attempt to address?