Two-Stream Gated Fusion ConvNets for Action Recognition

Jiagang Zhu,Wei Zou,Zheng Zhu
DOI: https://doi.org/10.1109/icpr.2018.8545639
2018-01-01
Abstract:The two-stream ConvNets in action recognition always fuse the two streams' predictions by the weighted averaging scheme. This fusion way with fixed weights lacks of pertinence to different action videos and always needs trial and error on the validation set. In order to enhance the adaptability of two-stream ConvNets, an end-to-end trainable gated fusion method, namely gating ConvNet, is proposed in this paper based on the MoE (Mixture of Experts) theory. The gating ConvNet takes the combination of convolutional layers of the spatial and temporal nets as input and outputs two fusion weights. To reduce the over-fitting of gating ConvNet caused by the redundancy of parameters, a new multi-task learning method is designed, which jointly learns the gating fusion weights for the two streams and learns the gating ConvNet for action classification. With the proposed gated fusion method and multi-task learning approach, competitive performance is achieved on the video action dataset UCF101.
What problem does this paper attempt to address?