3D Convolutional Network Based Foreground Feature Fusion.

Hanjian Song,Lihua Tian,Chen Li
DOI: https://doi.org/10.1109/ism.2018.00036
2018-01-01
Abstract:With explosion of videos, action recognition has become an important research subject. This paper makes a special effort to investigate and study 3D Convolutional Network. Focused on the problem of ConvNet dependence on multiple large scale dataset, we propose a 3D ConvNet structure which incorporate the original 3D-ConvNet features and foreground 3D-ConvNet features fused by static object and motion detection. Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101 and HMDB-51, experimental results demonstrate that with merely 50% pixels utilization, foreground ConvNet achieves satisfying performance as same as origin. With feature fusion, we achieve 83.7% accuracy on UCF-101 exceeding original ConvNet.
What problem does this paper attempt to address?