Rethinking Fusion Baselines for Multi-modal Human Action Recognition

Hongda Jiang,Yanghao Li,Sijie Song,Jiaying Liu
DOI: https://doi.org/10.1007/978-3-030-00764-5_17
2018-01-01
Abstract:In this paper we study fusion baselines for multi-modal action recognition. Our work explores different strategies for multiple stream fusion. First, we consider the early fusion which fuses the different modal inputs by directly stacking them along the channel dimension. Second, we analyze the late fusion scheme of fusing the scores from different modal streams. Then, the middle fusion scheme in different aggregation stages is explored. Besides, a modal transformation module is developed to adaptively exploit the complementary information from various modal data. We give comprehensive analysis of fusion schemes described above through experimental results and hope our work could benefit the community in multi-modal action recognition.
What problem does this paper attempt to address?