Space-Time Separate Modeling for Efficient Video Classification

Pei Cao,Shuo Wang,Jinmeng Wu,Yanbin Hao
DOI: https://doi.org/10.1088/1742-6596/2024/1/012063
2021-01-01
Journal of Physics Conference Series
Abstract:Efficient video classification requires the deep neural network models to be much lightweight. Current deep convolutional networks generally adopt 3D convolutions or similar spatio-temporal computational counterparts to process the 3D signal of videos. However, due to the heavy computational load of those 3D units, they suffer from the problems of hard training and inefficient inferring. To address these problems, this paper proposes a novel yet efficient space-time separate (STS) modelling module. STS splits the feature channels into multiple groups and separately model various types of content information from the channel groups using different convolutional operations. Since the channel splitting mechanism can significantly reduce the model complexity, STS is much more lightweight than the existing video models. Particularly, the joint use of spatial/temporal/spatio-temporal convolutions achieves paralleled information modelling in a single block. We conduct experiments on two benchmarked video datasets to evaluate the performance of STS and demonstrate its effectiveness and efficiency on the video classification task.
What problem does this paper attempt to address?