Multipath 3D-Conv encoder and temporal-sequence decision for repetitive-action counting

Yicheng Qiu,Li Niu,Feng Sha
DOI: https://doi.org/10.1016/j.eswa.2024.123760
IF: 8.5
2024-03-28
Expert Systems with Applications
Abstract:Counting repetitive actions is important in work and daily life. Automated counting using deep learning provides a more efficient, accurate alternative to manual counting, which is tedious and error-prone Deep-learning models have been proposed to automatically count repetitive actions in video content. However, for these models to be applied to realistic scenes, high-quality performance and generalization to multiple environments, particularly for long videos, are essential. To address these challenges, we propose a new model, ME-RAC, which includes the multipath 3D-Conv encoder module, and we also propose a temporal-sequence random-combination data augmentation to improve counting performance and prevent model over-fitting during training. Additionally, we propose the temporal-sequence-decision (TSD) framework system to realize long repetitive-action counting in complex realistic scenes. We conducted experiments to validate that our proposed methods perform better than comparable methods and our TSD framework achieved unique performance in long repetitive-action-counting tasks.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?