Hierarchical Spatial-Temporal Network for Skeleton-Based Temporal Action Segmentation.

Chenwei Tan,Tao Sun,Talas Fu,Yuhan Wang,Minjie Xu,Shenglan Liu
DOI: https://doi.org/10.1007/978-981-99-8549-4_3
2024-01-01
Abstract:Skeleton-based Temporal Action Segmentation (TAS) plays an important role in analyzing long videos of motion-centered human actions. Recent approaches perform spatial and temporal information modeling simultaneously in the spatial-temporal topological graph, leading to high computational costs due to the large graph magnitude. Additionally, multi-modal skeleton data has sufficient semantic information, which has not been fully explored. This paper proposes a Hierarchical Spatial-Temporal Network (HSTN) for skeleton-based TAS. In HSTN, the Multi-Branch Transfer Fusion (MBTF) module utilizes a multi-branch graph convolution structure with an attention mechanism to capture spatial dependencies in multi-modal skeleton data. In addition, the Multi-Scale Temporal Convolution (MSTC) module aggregates spatial information and performs multi-scale temporal information modeling to capture long-range dependencies. Extensive experiments on two challenging datasets are performed and our proposed method outperforms the State-of-the-Art (SOTA) methods.
What problem does this paper attempt to address?