Temporal Pyramid Relation Network for Video-Based Gesture Recognition

Ke Yang,Rongchun Li,Peng Qiao,Qiang Wang,Dongsheng Li,Yong Dou
DOI: https://doi.org/10.1109/icip.2018.8451700
2018-01-01
Abstract:Gesture recognition in video is an important application of computer vision. However, there are few works talked about the temporal order or relation of the frames in video, which is important for model gestures. In this paper, we propose Temporal Pyramid Relation Network (TPRN) which can model the temporal relation of video frames effectively and efficiently. First, we use Temporal Pyramid Pooling (TPP) layer to get temporal feature sequences of multiple scale pyramids. Then, a Temporal Relation Network (TRN) is stacked on the feature sequence of each scale respectively to model the temporal relations of video frames at multiple scales. At last, representations of all scales are aggregated to get the final prediction. TPRN can take video clips of various length as input and is scalable for video length. We evaluate TPRN on a recently released very large video-based gesture recognition dataset - 20BN-Jester dataset v1, and TPRN achieves competitive performance.
What problem does this paper attempt to address?