Add: Actionness-Pooled Deep-Convolutional Descriptor

Tingting Han,Hongxun Yao,Xiaoshuai Sun,Wenlong Xie,Yanhao Zhang
DOI: https://doi.org/10.1109/ICME.2018.8486535
2018-01-01
Abstract:Recognition of general actions has achieved great breakthroughs in recent years. However, in real-world applications, finer-grained action classification is often needed. The major challenge is that fine-grained actions usually share high similarities in both appearance and motion pattern, making it difficult to distinguish them with existing general action representation. To solve this problem, we introduce visual attention mechanism into the proposed descriptor, termed as Actionness-pooled Deep-convolutional Descriptor (ADD). Instead of pooling features uniformly from the entire video, we aggregate features in sub-regions that are more likely to contain actions according to actionness maps, which endow ADD with the capability of capturing the subtle differences between fine-grained actions. We conduct experiments on HIT Dances dataset, one of the few existing datasets for fine-grained action analysis. Quantitative results have demonstrated that ADD remarkably outperforms traditional two-stream representation. Extensive experiments on two general action benchmarks, JHMDB and UCF101, have additionally proved that combining ADD with end-to-end ConvNet can further boost the recognition performance.
What problem does this paper attempt to address?