Improving accuracy of temporal action detection by deep hybrid convolutional network

Ming-Gang Gan,Yan Zhang
DOI: https://doi.org/10.1007/s11042-022-13962-1
IF: 2.577
2023-04-16
Multimedia Tools and Applications
Abstract:Temporal action detection, a fundamental yet challenging task in understanding human actions, is usually divided into two stages: temporal action proposal generation and proposal classification. Classifying action proposals is always considered an action recognition task and receives little attention. However, compared with action classification, classifying action proposals has more large intra-class variations and subtle inter-class differences, making it more difficult to classify accurately. In this paper, we propose a novel end-to-end framework called Deep Hybrid Convolutional Network (DHCNet) to classify action proposals and achieve high-performance temporal action detection. DHCNet improves temporal action detection performance from three aspects. First, DHCNet utilizes Subnet I to effectively model the temporal structure of proposals and generate discriminative proposal features. Second, the Subnet II of DHCNet exploits Graph Convolution (GConv) to acquire information from other proposals and obtains much semantic information to enhance the proposal feature. Third, DHCNet adopts a coarse-to-fine cascaded classification, where the influence of large intra-class variations and subtle inter-class differences are reduced significantly at different granularities. Besides, we design an iterative boundary regression method based on closed-loop feedback to refine the temporal boundaries of proposals. Extensive experiments demonstrate the effectiveness of our approach. Furthermore, DHCNet achieves the state-of-the-art performance on the THUMOS'14 dataset(59.9% on mAP@0.5).
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?