SG-TCN: Semantic Guidance Temporal Convolutional Network for Action Segmentation.

Yunlu Zhang,Keyan Ren,Chun Zhang,Tong Yan
DOI: https://doi.org/10.1109/ijcnn55064.2022.9891932
2022-01-01
Abstract:While recent approaches based on multi-stage temporal convolutional network (TCN) can achieve good accuracy in action segmentation, they cannot get an excellent F1-score, which makes them difficult to be applied in practice. The main issue we investigated is that the TCN lacks the max-pool and hence it is difficult to capture sufficient semantic information which leads to over-segmentation. To reduce the occurrence of over-segmentation, we propose the Semantic Guidance module (SG) to capture high-level semantic features and guide the TCN. In addition, we consider the role of each stage in a multi-stage architecture and deploy a lighter parameter-sharing TCN (PS-TCN) as the backbone, which achieves higher accuracy and reduces about 16% parameters than the most popular backbone. Simultaneously, our proposed Video Speed Prediction module (VSP) explores temporal information and improves temporal modeling ability. Combining PS-TCN with VSP and using SG for guidance yield an accurate and robust segmentation model. Extensive experiments demonstrate that our model is much better than the benchmark MS-TCN++ (e.g. from 45.9% to 56.4% F1@50 on Breakfast) and achieves state-of-the-art performance on two challenging datasets. The code is available at https://github.com/zhangylll/SG-TCN
What problem does this paper attempt to address?