Stacking-Based Attention Temporal Convolutional Network for Action Segmentation

Liu Yang,Yu Jiang,Junkun Hong,Zhenjie Wu,Zhan Yang,Jun Long
DOI: https://doi.org/10.1109/icassp49357.2023.10097024
2023-01-01
Abstract:Action segmentation plays an important role in video understanding, which is implemented by frame-wise action classification. Recent works on action segmentation capture long-term dependencies by increasing temporal convolution layers in Temporal Convolution Networks (TCNs). However, high layers in TCNs are more coarse access to video features, resulting in the loss of fine-grained information for frame-wise action classification. To address the above issues, we propose a novel Attention-based Temporal Convolution (ATC) block to capture fine-grained information of temporal dependencies for frame-wise action classification by self-attention mechanism. Via stacking ATC blocks, we design a Stacking-based Attention Temporal Convolutional Network (SATC) to adaptively capture long-term and short-term dependencies, according to the semantic similarity of features on different temporal receptive fields simultaneously. The experimental results demonstrate that our SATC outperforms other baselines on all three challenging datasets: GTEA, 50Salads and Breakfast.
What problem does this paper attempt to address?