A Closer Look at Video Sampling for Sequential Action Recognition

Yu Zhang,Junjie Zhao,Zhengjie Chen,Siya Mi,Hongyuan Zhu,Xin Geng
DOI: https://doi.org/10.1109/tcsvt.2023.3274108
IF: 5.859
2023-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:In recent years, sequential action recognition has attracted increasingly attention as it requires long-term sequential and compositional reasoning of human actions and object interactions. Existing methods perform reasoning either by using snippets that cover very short consecutive frames or key frames sampled from segments, which take a bias process of local and global temporal information. We also find ad-hoc training and ensembling of two separate networks using existing sampling strategies can easily outperform complex state-of-the-art methods, which reveals the complementary nature of current sampling strategies. Motivated by this observation, we propose a simple yet efficient strategy named Dense Segmental Sampling (DSS) and a novel network architecture named Temporal Dense Segment Network (TDSN) to capture the complementary information from DSS. Our TDSN achieves excellent results on benchmark action recognition datasets, which not only validate the proposed strategy but also help highlight the importance along this direction for sequential video reasoning.
engineering, electrical & electronic
What problem does this paper attempt to address?