DTW-CNN: time series-based human interaction prediction in videos using CNN-extracted features

Mahlagha Afrasiabi,Hassan khotanlou,Muharram Mansoorizadeh
DOI: https://doi.org/10.1007/s00371-019-01722-6
IF: 2.835
2019-07-11
The Visual Computer
Abstract:Recently, the prediction of interactions in videos has been an active subject in computer vision. Its goal is to deduce interactions in their early stages. Many approaches have been proposed to predict interaction, but it still remains a challenging problem. In the present paper, features are optical flow fields extracted from video frames using convolutional neural networks. This feature, which is extracted from successive frames, constructs a time series. Then, the problem is modeled in the form of a time series prediction. Prediction of the interaction type is based on matching the time series under experiment with the time series available in the training set. Dynamic time warping provides an optimal match between a pair of time-series data by a nonlinear mapping between two data. Finally, the SVM and KNN classification methods with dynamic time warping distance are used to predict the video label. The results showed that the proposed model improved on standard interaction recognition datasets including the TVHI, BIT, and UT interaction.
computer science, software engineering
What problem does this paper attempt to address?