Dynamic Texture and Scene Classification by Transferring Deep Image Features

Xianbiao Qi,Chun-Guang Li,Guoying Zhao,Xiaopeng Hong,Matti Pietikainen
DOI: https://doi.org/10.1016/j.neucom.2015.07.071
2015-01-01
Abstract:Dynamic texture and scene classification are two fundamental problems in understanding natural video content. Extracting robust and effective features is a crucial step towards solving these problems. However, the existing approaches suffer from the sensitivity to either varying illumination, or viewpoint changes, or even camera motion, and/or the lack of spatial information. Inspired by the success of deep structures in image classification, we attempt to leverage a deep structure to extract features for dynamic texture and scene classification. To tackle with the challenges in training a deep structure, we propose to transfer some prior knowledge from image domain to video domain. To be more specific, we propose to apply a well-trained Convolutional Neural Network (ConvNet) as a feature extractor to extract mid-level features from each frame, and then form the video-level representation by concatenating the first and the second order statistics over the mid-level features. We term this two-level feature extraction scheme as a Transferred ConvNet Feature (TCoF). Moreover, we explore two different implementations of the TCoF scheme, i.e., the spatial TCoF and the temporal TCoF. In the spatial TCoF, the mean-removed frames are used as the inputs of the ConvNet; whereas in the temporal TCoF, the differences between two adjacent frames are used as the inputs of the ConvNet. We evaluate systematically the proposed spatial TCoF and the temporal TCoF schemes on three benchmark data sets, including DynTex, YUPENN, and Maryland, and demonstrate that the proposed approach yields superior performance.
What problem does this paper attempt to address?