Temporal Cross-Layer Correlation Mining for Action Recognition

Linchao Zhu,Hehe Fan,Yawei Luo,Mingliang Xu,Yi Yang
DOI: https://doi.org/10.1109/tmm.2021.3057503
IF: 7.3
2022-01-01
IEEE Transactions on Multimedia
Abstract:Neighboring frames are more correlated compared to frames from further temporal distances. In this paper, we aim to explore the temporal correlations among neighboring frames and exploit cross-layer multi-scale features for action recognition. First, we present a Temporal Cross-Layer Correlation (TCLC) framework for temporal correlation learning. The unified framework uncovers both local and global structures from video data, enabling a better exploration of temporal context and assisting cross-layer spatio-temporal feature learning. Second, we propose a novel cross-layer attention and a center-guided attention mechanism to integrate features with contextual knowledge from multiple scales. Our method is a two-stage process for effective cross-layer feature learning. The first stage incorporates the cross-layer attention module to decide the importance weight of the convolutional layers. The second stage leverages the center-guided attention mechanism to aggregate local features from each layer for the generation of a final video representation. We leverage global centers to extract shared semantic knowledge among videos. We evaluate TCLC on three action recognition datasets, i.e., UCF-101, HMDB-51 and Kinetics. Our experimental results demonstrate the superiority of our proposed temporal correlation mining method.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?