Learning and Distillating the Internal Relationship of Motion Features in Action Recognition.

Lu Lu,Siyuan Li,Niannian Chen,Lin Gao,Yong Fan,Yong Jiang,Ling Wu
DOI: https://doi.org/10.1007/978-3-030-63820-7_28
2020-01-01
Abstract:In the field of video-based action recognition, a majority of advanced approaches train a two-stream architecture in which an appearance stream for images and a motion stream for optical flow frames. Due to the considerable computation cost of optical flow and high inference latency of the two-stream method, knowledge distillation is introduced to efficiently capture two-stream representation while only inputting RGB images. Following this technique, this paper proposes a novel distillation learning strategy to sufficiently learn and mimic the representation of the motion stream. Besides, we propose a lightweight attention-based fusion module to uniformly exploit both appearance and motion information. Experiments illustrate that the proposed distillation strategy and fusion module achieve better performance over the baseline technique, and our proposal outperforms the known state-of-art approaches in terms of single-stream and traditional two-stream methods.
What problem does this paper attempt to address?