Global and Local Discriminative Patches Exploiting for Action Recognition

Jintao Wu,Wu Luo,Weiwei Liu,Chongyang Zhang
DOI: https://doi.org/10.1109/icassp40776.2020.9054282
2020-01-01
Abstract:Recent human action recognition models mainly focus on exploiting human features, such as pose or skeleton features. However, most of these methods do not pay enough attention to action-related backgrounds. In this work we propose a novel multi-stream features fusion framework based on discriminative patch exploiting. Unlike existing part-based or attention-based multi-stream methods, our work improves the recognition accuracy by 1) Paying more attention to exploiting of global and local discriminative patches, which include not only the acting human but also the interactive scenes. 2) Proposing an effective multi-stream feature pooling and fusion mechanism: 2D and 3D features from RGB frames and discriminative patches are combined to enhance spatial-temporal feature representation ability. Our framework is evaluated on two widely used video action benchmarks, where it outperforms other state-of-the-art methods: the accuracy up to 87.8% at HMDB51, and 98.8% at UCF101.
What problem does this paper attempt to address?