DmrNet: Dual-stream Mutual Information Contraction and Re-discrimination Network for Semi-supervised Temporal Action Detection

Qiming Zhang,Zhengping Hu,Yulu Wang,Shuai Bi,Hehao Zhang,Jirui Di
DOI: https://doi.org/10.1007/s12559-024-10374-1
IF: 4.89
2024-11-29
Cognitive Computation
Abstract:Semi-supervised temporal action detection only requires a small number of labeled samples from the dataset and utilizes the remaining unlabeled samples for model training, effectively alleviating the significant time and manpower costs associated with annotating large-scale temporal action detection datasets. However, previous semi-supervised temporal action detection methods relied on sequential action localization and classification, which leads to erroneous localization predictions that can easily affect subsequent classification predictions, resulting in error propagation problem. To overcome error propagation, we propose a dual-stream mutual information contraction and re-discrimination network (DmrNet). Specifically, the traditional two-step strategy of temporal action detection has been changed to a four-step parallel strategy by us. Firstly, this paper designs the first-step classification prediction and the second-step localization prediction as a parallel structure to prevent error propagation from localization to classification. Then, in the third step, the dual-stream mutual information contraction part maps the dual-stream features to a new vector space to ensure the cross-correlation between classification and action localization. Finally, the fourth step of classification re-discrimination part captures the consistency information of the dual-stream structure to enhance internal representation. Compared with existing methods, DmrNet achieved an average accuracy improvement of 10.7% on ActivityNet v1.3 and 5.2% on THUMOS14 using only 10% annotation data. The experimental results show that the proposed DmrNet not only achieves good detection performance in semi-supervised learning but also achieves performance comparable to state-of-the-art methods in fully supervised learning.
computer science, artificial intelligence,neurosciences
What problem does this paper attempt to address?