Physical Knowledge Driven Multi-scale Temporal Receptive Field Network for Compressed Video Action Recognition

Lijun He,Miao Zhang,Sijin Zhang,Fan Li
DOI: https://doi.org/10.1145/3460418.3480405
2021-01-01
Abstract:Intelligent terminal based action recognition is important to smart cities. However, due to the dependency on training data and high complexity of extracting information, the existing image based methods cannot be implemented. Moreover, recognizing the actions with different durations is still a challenge. Due to the issues, we first extend traditional image domain to the compressed domain to efficiently extract the information of key frames and physical knowledge MVs (Motion Vectors), which can reflect the multi-scale temporal feature, without complete decoding. Then, to recognize the actions with different durations, a multi-scale temporal receptive field network including short-term and long-term branches, is proposed to capture the action's instant change based on the extracted MVs, the long temporal feature between adjacent key frames and the interaction between them simultaneously. Results show that our algorithm can achieve better balance between accuracy and computation complexity.
What problem does this paper attempt to address?