Cross-Stream Selective Networks for Action Recognition

Bowen Pan,Jiankai Sun,Wuwei Lin,Limin Wang,Weiyao Lin
DOI: https://doi.org/10.1109/cvprw.2019.00059
2019-01-01
Abstract:Combining multiple information streams has shown obvious improvements in video action recognition. Most existing works handle each stream independently or perform a simple combination on temporally simultaneous samples in multi-streams, which fails to make full use of the stream wise complementary property due to the negligence of the temporal pattern gaps among streams. In this paper, we propose a cross-stream selective network (CSN) to properly integrate and evaluate information in multi-streams. The proposed CSN first introduces a local selective-sampling module (LSM), which can find asynchronous correspondences among streams and construct high-correlated sample groups across multiple information streams. This LSM can effectively deal with the temporal dis-alignment among different streams, leading to a better integration of cross-stream information. We further introduce a global adaptive weighting module (GAM). It adaptively evaluates the importance weights for each cross-stream sample group and selects temporally more important ones in action recognition. With the integration of cross-stream information, our GAM can obtain more reasonable importance than the existing single-stream weighting schemes. Extensive experiments on benchmark datasets of UCF101 and HMDB51 demonstrate the effectiveness of our approach over previous state-of-the-art methods.
What problem does this paper attempt to address?