Pillar Networks: Combining parametric with non-parametric methods for action recognition

Yu Qian,Biswa Sengupta
DOI: https://doi.org/10.1016/j.robot.2019.04.005
IF: 3.7
2019-01-01
Robotics and Autonomous Systems
Abstract:Image understanding using deep convolutional network has reached human-level performance, yet the closely related problem of video understanding, especially action recognition, has not reached the same required level of maturity. As a solution we propose two independent architectures for action recognition using meta-classifiers – the first is based on combining kernels of support-vector-machines (SVM) and the second is based on distributed Gaussian Processes (GP). Both receive features that are computed using a multi-stream deep convolutional neural network, enabling the achievement of state-of-the-art performance on a 51 and a 101-class action recognition problem (HMDB-51/UCF-101 dataset). We have named the resulting architecture ‘pillar networks’ as each (very) deep neural network acts as a pillar for the meta-classifiers. In addition, we illustrate that hand-crafted features such as the improved dense trajectories (iDT) and Multi-skip Feature Stacking (MIFS), when used as additional pillars, can further supplement the performance.
What problem does this paper attempt to address?