A Discussion of Data Sampling Strategies for Early Action Prediction

Xiaofa Liu,Xiaoli Liu,Jianqin Yin
DOI: https://doi.org/10.1007/978-981-16-9247-5_24
2022-01-01
Abstract:Action prediction aims to predict an ongoing activity from an incomplete video, which is an important branch of human activity analysis with the important application in a number of fields, such as security surveillance, human-machine interaction, automatic driving, etc. Due to time continuity, there are a large number of redundant frames in video action sequences, which often brings challenges such as low computational efficiency and noise for action prediction. Most of the existing works levarage dense sampling or sparse sampling for processing video frames and characterize actions. On the one hand, the dense sample-based method often introduces redundant noise for predictions, easily causing confusing of the action semantics. On the other hand, although sparse sample-based method can alleviate the problem of redundant noise to a certain extent, it ignores the impact of sampling rate on action representation. In this paper, we combine the two-stream network framework and the teacher-student network framework to build an action prediction model, and discuss the influence of action representation under different sampling rates for partial or full videos. In this way, we can select more appropriate frames for video representation and thus achieve more accurate action prediction. The method proposed in this paper has achieved the current state-of-the-art performance on the standard dataset, i.e., UCF101, which verifies the effectiveness of our method.
What problem does this paper attempt to address?