Temporal Relations Matter: A Two-Pathway Network for Aerial Video Recognition

Pu Jin,Lichao Mou,Yuansheng Hua,Gui-Song Xia,Xiao Xiang Zhu
DOI: https://doi.org/10.1109/igarss47720.2021.9554868
2021-01-01
Abstract:With the increasing volume of aerial videos, the demand for automatically parsing these videos is surging. To achieve this, current researches mainly focus on extracting a holistic feature with convolutions along both spatial and temporal dimensions. However, these methods are limited by small temporal receptive fields and cannot adequately capture long-term temporal dependencies which are important for describing complicated dynamics. In this paper, we propose a novel two-pathway network to model not only holistic features, but also temporal relations for aerial video classification. More specially, our model employs a two-pathway architecture: (1) a holistic representation pathway to learn a general feature of frame appearances and short-term temporal variations and (2) a temporal relation pathway to capture multi-scale temporal relations across arbitrary frames, providing long-term temporal dependencies. Our model is evaluated on event recognition dataset, ERA, and achieves the state-of-the-art results. This demonstrates its effectiveness and good generalization capacity.
What problem does this paper attempt to address?