Enhanced Attention Tracking with Multi-Branch Network for Egocentric Activity Recognition

Tianshan Liu,Kin-Man Lam,Rui Zhao,Jun Kong
DOI: https://doi.org/10.1109/tcsvt.2021.3104651
IF: 5.859
2022-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:The emergence of wearable devices has opened up new potentials for egocentric activity recognition. Although some methods integrate attention mechanisms into deep neural networks to capture fine-grained human-object interactions in a weak-supervision manner, they either ignore exploiting the temporal consistency or generate attention based on considering appearance cues only. To address these limitations, in this paper, we propose an enhanced attention-tracking method, combined with multi-branch network (EAT-MBNet), for egocentric activity recognition. Specifically, we propose class-aware attention maps (CAAMs) by employing a self-attention-based module to refine the class activation maps (CAMs). Our proposed method can enhance the semantic dependency between the activity categories and the feature maps. To highlight the discriminative features from the regions of interest across frames, we propose a flow-guided attention-tracking (F-AT) module, by simultaneously leveraging historical attention and motion patterns. Furthermore, we propose a cross-modality modeling branch based on an interactive GRU module, which captures the time-synchronized long-term relationships between the appearance and motion branches. Experimental results on four egocentric activity benchmarks demonstrate that the proposed method achieves state-of-the-art performance.
What problem does this paper attempt to address?