Multi-feature and Multi-branch Action Segmentation Framework for Modeling Long-Short-Term Dependencies

Junkun Hong,Yitian Long,Yueyi Luo,Qianqian Qi,Jun Long
DOI: https://doi.org/10.1109/icme57554.2024.10688242
2024-01-01
Abstract:Pioneer efforts have been dedicated to action segmentation that predicts what step is occurring in a video frame. Existing studies focus on improving the accuracy of video segmentation, but neglect the temporal continuity of intersegments and semantic consistency of intra-segments, which are necessary for developing computer-assisted systems. Meanwhile, Temporal Convolutional Networks have shown good performance in action segmentation tasks, but their high layers tend to lose fine-grained information and impact the results. Toward this end, we devise a multi-feature and multi-branch action segmentation framework for modeling long-term and short-term dependencies. Specifically, we present a multi-feature fusion to enhance temporal video representation and design a multi-branch predictor for extracting both segment-level and frame-level information. We justify our framework over three datasets and experimental results demonstrate its superiority, especially in Edit and F1 metrics, which means our framework is more applicable to computer-assisted systems.
What problem does this paper attempt to address?