DSTC-Net: differential spatio-temporal correlation network for similar action recognition

Hongwei Chen,Shiqi He,Zexi Chen
DOI: https://doi.org/10.1007/s00530-024-01358-0
IF: 3.9
2024-05-22
Multimedia Systems
Abstract:Skeleton-based action recognition methods have made impressive progress. But to enhance the discrimination of similar actions, it needs to focus on learning the difference information in spatio-temporal features. For example, when handling two similar actions with subtle differences in joint correlations during spatio-temporal feature extraction. In this paper, we propose a new spatio-temporal modeling method called DSTC-Net, which is a differential spatio-temporal correlation network for similar action recognition. Its component dynamically learns the information within each channel and uses the refinement attention (RA) as a new method to learn the critical joint connections of the local region, thus enhancing the contextual learning capability of the model. In addition, to capture the correlation between non-adjacent frames, we introduce the differential temporal (DT) module and combine it with the temporal modeling unit. This enables effective learning of global temporal features of nodes, assisting the model in capturing the difference information in the temporal sequence caused by the difference in the direction of motion. These components together constitute a more detailed and comprehensive learning network for spatio-temporal information. Through experiments conducted on three datasets, NTU RGB+D, NTU RGB+D 120, and Northwestern-UCLA, our improved method demonstrates excellent performance in recognizing similar actions while introducing fewer additional parameters.
computer science, information systems, theory & methods
What problem does this paper attempt to address?