Temporal Attentive Network for Action Recognition

Yemin Shi,Yonghong Tian,Tiejun Huang,Yaowei Wang
DOI: https://doi.org/10.1109/ICME.2018.8486452
2018-01-01
Abstract:In action recognition, one of the most important challenges is to jointly utilize the texture and motion information as well as capturing the long-term dependence of various common and action-specific postures. Motivated by this fact, this paper proposes Temporal Attentive Network (TAN) for action recognition. The key idea in TAN is that not all postures, each of which represented by a small collection of consecutive frames, contribute equally to the successful recognition of an action. As a result, TAN incorporates two separate spatial and temporal streams into one network. Information in the two streams is partially shared so that discriminative spatiotemporal features can be extracted to characterize various postures in an action. Moreover, a temporal attention mechanism is introduced in the form of Long-Short Term Memory (LSTM) network. With this mechanism, features from the action-specific postures can be emphasized, while common postures shared by many different actions will be ignored to some extent. By jointly using such spatial and temporal information as well as attentive cues in a single network, TAN achieves impressive performance on two public datasets, HMDB51 and UCF101, with accuracy scores of 72.5% and 94.1 %, respectively.
What problem does this paper attempt to address?